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Preface to the second and third editions 


Since the publication of the first edition, many students and lectur- 
ers have communicated a number of minor typos and other corrections 
to me. There was also some demand for a hardcover edition of the 
texts. Because of this, the publishers and I have decided to incorporate 
the corrections and issue a hardcover second edition of the textbooks. 
The layout, page numbering, and indexing of the texts have also been 
changed; in particular the two volumes are now numbered and indexed 
separately. However, the chapter and exercise numbering, as well as the 
mathematical content, remains the same as the first edition, and so the 
two editions can be used more or less interchangeably for homework and 
study purposes. 

The third edition contains a number of corrections that were reported 
for the second edition, together with a few new exercises, but is otherwise 
essentially the same text. 


Preface to the first edition 


This text originated from the lecture notes I gave teaching the honours 
undergraduate-level real analysis sequence at the University of Califor- 
nia, Los Angeles, in 2003. Among the undergraduates here, real anal- 
ysis was viewed as being one of the most difficult courses to learn, not 
only because of the abstract concepts being introduced for the first time 
(e.g., topology, limits, measurability, etc.), but also because of the level 
of rigour and proof demanded of the course. Because of this percep- 
tion of difficulty, one was often faced with the difficult choice of either 
reducing the level of rigour in the course in order to make it easier, or 
to maintain strict standards and face the prospect of many undergradu- 
ates, even many of the bright and enthusiastic ones, struggling with the 
course material. 

Faced with this dilemma, I tried a somewhat unusual approach to 
the subject. Typically, an introductory sequence in real analysis assumes 
that the students are already familiar with the real numbers, with math- 
ematical induction, with elementary calculus, and with the basics of set 
theory, and then quickly launches into the heart of the subject, for in- 
stance the concept of a limit. Normally, students entering this sequence 
do indeed have a fair bit of exposure to these prerequisite topics, though 
in most cases the material is not covered in a thorough manner. For in- 
stance, very few students were able to actually define a real number, or 
even an integer, properly, even though they could visualize these num- 
bers intuitively and manipulate them algebraically. This seemed to me 
to be a missed opportunity. Real analysis is one of the first subjects 
(together with linear algebra and abstract algebra) that a student en- 
counters, in which one truly has to grapple with the subtleties of a truly 
rigorous mathematical proof. As such, the course offered an excellent 
chance to go back to the foundations of mathematics, and in particular 
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the opportunity to do a proper and thorough construction of the real 
numbers. 

Thus the course was structured as follows. In the first week, I de- 
scribed some well-known “paradoxes” in analysis, in which standard laws 
of the subject (e.g., interchange of limits and sums, or sums and inte- 
grals) were applied in a non-rigorous way to give nonsensical results such 
as 0 = 1. This motivated the need to go back to the very beginning of the 
subject, even to the very definition of the natural numbers, and check 
all the foundations from scratch. For instance, one of the first homework 
assignments was to check (using only the Peano axioms) that addition 
was associative for natural numbers (i.e., that (a +b) +c=a+(b+c) 
for all natural numbers a,b,c: see Exercise 2.2.1). Thus even in the 
first week, the students had to write rigorous proofs using mathematical 
induction. After we had derived all the basic properties of the natural 
numbers, we then moved on to the integers (initially defined as formal 
differences of natural numbers); once the students had verified all the 
basic properties of the integers, we moved on to the rationals (initially 
defined as formal quotients of integers); and then from there we moved 
on (via formal limits of Cauchy sequences) to the reals. Around the 
same time, we covered the basics of set theory, for instance demonstrat- 
ing the uncountability of the reals. Only then (after about ten lectures) 
did we begin what one normally considers the heart of undergraduate 
real analysis - limits, continuity, differentiability, and so forth. 

The response to this format was quite interesting. In the first few 
weeks, the students found the material very easy on a conceptual level, 
as we were dealing only with the basic properties of the standard num- 
ber systems. But on an intellectual level it was very challenging, as one 
was analyzing these number systems from a foundational viewpoint, in 
order to rigorously derive the more advanced facts about these number 
systems from the more primitive ones. One student told me how difficult 
it was to explain to his friends in the non-honours real analysis sequence 
(a) why he was still learning how to show why all rational numbers 
are either positive, negative, or zero (Exercise 4.2.4), while the non- 
honours sequence was already distinguishing absolutely convergent and 
conditionally convergent series, and (b) why, despite this, he thought 
his homework was significantly harder than that of his friends. Another 
student commented to me, quite wryly, that while she could obviously 
see why one could always divide a natural number n into a positive 
integer q to give a quotient a and a remainder r less than q (Exercise 
2.3.5), she still had, to her frustration, much difficulty in writing down 
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a proof of this fact. (I told her that later in the course she would have 
to prove statements for which it would not be as obvious to see that 
the statements were true; she did not seem to be particularly consoled 
by this.) Nevertheless, these students greatly enjoyed the homework, as 
when they did perservere and obtain a rigorous proof of an intuitive fact, 
it solidified the link in their minds between the abstract manipulations 
of formal mathematics and their informal intuition of mathematics (and 
of the real world), often in a very satisfying way. By the time they were 
assigned the task of giving the infamous “epsilon and delta” proofs in 
real analysis, they had already had so much experience with formalizing 
intuition, and in discerning the subtleties of mathematical logic (such 
as the distinction between the “for all” quantifier and the “there exists” 
quantifier), that the transition to these proofs was fairly smooth, and we 
were able to cover material both thoroughly and rapidly. By the tenth 
week, we had caught up with the non-honours class, and the students 
were verifying the change of variables formula for Riemann-Stieltjes in- 
tegrals, and showing that piecewise continuous functions were Riemann 
integrable. By the conclusion of the sequence in the twentieth week, we 
had covered (both in lecture and in homework) the convergence theory of 
Taylor and Fourier series, the inverse and implicit function theorem for 
continuously differentiable functions of several variables, and established 
the dominated convergence theorem for the Lebesgue integral. 

In order to cover this much material, many of the key foundational 
results were left to the student to prove as homework; indeed, this was 
an essential aspect of the course, as it ensured the students truly ap- 
preciated the concepts as they were being introduced. This format has 
been retained in this text; the majority of the exercises consist of proving 
lemmas, propositions and theorems in the main text. Indeed, I would 
strongly recommend that one do as many of these exercises as possible 
- and this includes those exercises proving “obvious” statements - if one 
wishes to use this text to learn real analysis; this is not a subject whose 
subtleties are easily appreciated just from passive reading. Most of the 
chapter sections have a number of exercises, which are listed at the end 
of the section. 

To the expert mathematician, the pace of this book may seem some- 
what slow, especially in early chapters, as there is a heavy emphasis 
on rigour (except for those discussions explicitly marked “Informal” ), 
and justifying many steps that would ordinarily be quickly passed over 
as being self-evident. The first few chapters develop (in painful detail) 
many of the “obvious” properties of the standard number systems, for 
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instance that the sum of two positive real numbers is again positive (Ex- 
ercise 5.4.1), or that given any two distinct real numbers, one can find 
rational number between them (Exercise 5.4.5). In these foundational 
chapters, there is also an emphasis on non-circularity - not using later, 
more advanced results to prove earlier, more primitive ones. In partic- 
ular, the usual laws of algebra are not used until they are derived (and 
they have to be derived separately for the natural numbers, integers, 
rationals, and reals). The reason for this is that it allows the students 
to learn the art of abstract reasoning, deducing true facts from a lim- 
ited set of assumptions, in the friendly and intuitive setting of number 
systems; the payoff for this practice comes later, when one has to utilize 
the same type of reasoning techniques to grapple with more advanced 
concepts (e.g., the Lebesgue integral). 

The text here evolved from my lecture notes on the subject, and 
thus is very much oriented towards a pedagogical perspective; much 
of the key material is contained inside exercises, and in many cases I 
have chosen to give a lengthy and tedious, but instructive, proof in- 
stead of a slick abstract proof. In more advanced textbooks, the student 
will see shorter and more conceptually coherent treatments of this ma- 
terial, and with more emphasis on intuition than on rigour; however, 
I feel it is important to know how to do analysis rigorously and “by 
hand” first, in order to truly appreciate the more modern, intuitive and 
abstract approach to analysis that one uses at the graduate level and 
beyond. 

The exposition in this book heavily emphasizes rigour and formal- 
ism; however this does not necessarily mean that lectures based on 
this book have to proceed the same way. Indeed, in my own teach- 
ing I have used the lecture time to present the intuition behind the 
concepts (drawing many informal pictures and giving examples), thus 
providing a complementary viewpoint to the formal presentation in the 
text. The exercises assigned as homework provide an essential bridge 
between the two, requiring the student to combine both intuition and 
formal understanding together in order to locate correct proofs for a 
problem. This I found to be the most difficult task for the students, 
as it requires the subject to be genuinely learnt, rather than merely 
memorized or vaguely absorbed. Nevertheless, the feedback I received 
from the students was that the homework, while very demanding for 
this reason, was also very rewarding, as it allowed them to connect the 
rather abstract manipulations of formal mathematics with their innate 
intuition on such basic concepts as numbers, sets, and functions. Of 
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course, the aid of a good teaching assistant is invaluable in achieving this 
connection. 

With regard to examinations for a course based on this text, I would 
recommend either an open-book, open-notes examination with problems 
similar to the exercises given in the text (but perhaps shorter, with no 
unusual trickery involved), or else a take-home examination that involves 
problems comparable to the more intricate exercises in the text. The 
subject matter is too vast to force the students to memorize the defini- 
tions and theorems, so I would not recommend a closed-book examina- 
tion, or an examination based on regurgitating extracts from the book. 
(Indeed, in my own examinations I gave a supplemental sheet listing the 
key definitions and theorems which were relevant to the examination 
problems.) Making the examinations similar to the homework assigned 
in the course will also help motivate the students to work through and 
understand their homework problems as thoroughly as possible (as op- 
posed to, say, using flash cards or other such devices to memorize mate- 
rial), which is good preparation not only for examinations but for doing 
mathematics in general. 

Some of the material in this textbook is somewhat peripheral to 
the main theme and may be omitted for reasons of time constraints. 
For instance, as set theory is not as fundamental to analysis as are 
the number systems, the chapters on set theory (Chapters 3, 8) can be 
covered more quickly and with substantially less rigour, or be given as 
reading assignments. The appendices on logic and the decimal system 
are intended as optional or supplemental reading and would probably 
not be covered in the main course lectures; the appendix on logic is 
particularly suitable for reading concurrently with the first few chapters. 
Also, Chapter 5 (on Fourier series) is not needed elsewhere in the text 
and can be omitted. 

For reasons of length, this textbook has been split into two volumes. 
The first volume is slightly longer, but can be covered in about thirty 
lectures if the peripheral material is omitted or abridged. The second 
volume refers at times to the first, but can also be taught to students 
who have had a first course in analysis from other sources. It also takes 
about thirty lectures to cover. 

I am deeply indebted to my students, who over the progression of 
the real analysis course corrected several errors in the lectures notes 
from which this text is derived, and gave other valuable feedback. I am 
also very grateful to the many anonymous referees who made several 
corrections and suggested many important improvements to the text. 
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I also thank Biswaranjan Behera, Tai-Danae Bradley, Brian, Eduardo 
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Ulrich Groh, Bart Kleijngeld, Erik Koelink, Wang Kuyyang, Matthis 
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Chapter 1 


Metric spaces 


1.1 Definitions and examples 


In Definition 6.1.5 we defined what it meant for a sequence (2,,)°C,,, of 
real numbers to converge to another real number x; indeed, this meant 
that for every ¢ > 0, there exists an N > m such that |x — x,| < © for 
all n > N. When this is the case, we write limn+..6 tn = 2. 

Intuitively, when a sequence (2,,)°2,,, converges to a limit x, this 
means that somehow the elements x, of that sequence will eventually 
be as close to x as one pleases. One way to phrase this more precisely 
is to introduce the distance function d(x, y) between two real numbers 
by d(z,y) := |x — y|. (Thus for instance d(3,5) = 2, d(5,3) = 2, and 
d(3,3) = 0.) Then we have 


Lemma 1.1.1. Let (%n)?2,,, be a sequence of real numbers, and let x 
be another real number. Then (ap)°,, converges to x if and only if 
limi, yao dt ey = 0. 


Proof. See Exercise 1.1.1. 


One would now like to generalize this notion of convergence, so that 
one can take limits not just of sequences of real numbers, but also se- 
quences of complex numbers, or sequences of vectors, or sequences of 
matrices, or sequences of functions, even sequences of sequences. One 
way to do this is to redefine the notion of convergence each time we 
deal with a new type of object. As you can guess, this will quickly get 
tedious. A more efficient way is to work abstractly, defining a very gen- 
eral class of spaces - which includes such standard spaces as the real 
numbers, complex numbers, vectors, etc. - and define the notion of con- 
vergence on this entire class of spaces at once. (A space is just the set 
© Springer Science+Business Media Singapore 2016 and Hindustan Book Agency 2015 1 


T. Tao, Analysis II, Texts and Readings in Mathematics 38, 
DOI 10.1007/978-981-10-1804-6_1 


2 1. Metric spaces 


of all objects of a certain type - the space of all real numbers, the space 
of all 3 x 3 matrices, etc. Mathematically, there is not much distinction 
between a space and a set, except that spaces tend to have much more 
structure than what a random set would have. For instance, the space of 
real numbers comes with operations such as addition and multiplication, 
while a general set would not.) 

It turns out that there are two very useful classes of spaces which do 
the job. The first class is that of metric spaces, which we will study here. 
There is a more general class of spaces, called topological spaces, which 
is also very important, but we will only deal with this generalization 
briefly, in Section 2.5. 

Roughly speaking, a metric space is any space X which has a concept 
of distance d(x,y) - and this distance should behave in a reasonable 
manner. More precisely, we have 


Definition 1.1.2 (Metric spaces). A metric space (X,d) is a space X 
of objects (called points), together with a distance function or metric 
d: X x X — |0,+00), which associates to each pair x,y of points in X 
a non-negative real number d(x, y) > 0. Furthermore, the metric must 
satisfy the following four axioms: 


(a) For any 7 € X, we have d(z,x) = 0. 
(b) (Positivity) For any distinct x,y € X, we have d(x, y) > 0. 


(c 
(d 


) 
) 
) (Symmetry) For any z,y € X, we have d(x, y) = d(y, x). 

) (Triangle inequality) For any x,y,z € X, we have d(x,z) < 
d(x,y) + d(y, z). 


In many cases it will be clear what the metric d is, and we shall abbre- 
viate (X,d) as just X. 


Remark 1.1.3. The conditions (a) and (b) can be rephrased as follows: 
for any x,y € X we have d(z, y) = 0 if and only if x = y. (Why is this 
equivalent to (a) and (b)?) 


Example 1.1.4 (The real line). Let R be the real numbers, and let d: 
Rx R = (0, 00) be the metric d(x, y) := |x —y| mentioned earlier. Then 
(R,d) is a metric space (Exercise 1.1.2). We refer to d as the standard 
metric on R, and if we refer to R as a metric space, we assume that the 
metric is given by the standard metric d unless otherwise specified. 
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Example 1.1.5 (Induced metric spaces). Let (X,d) be any metric 
space, and let Y be a subset of X. Then we can restrict the metric 
function d: X x X — [0,+00) to the subset Y x Y of X x X to cre- 
ate a restricted metric function dlyxy : Y x Y — [0,+00) of Y; this 
is known as the metric on Y induced by the metric d on X. The pair 
(Y,dlyxy) is a metric space (Exercise 1.1.4) and is known the subspace 
of (X,d) induced by Y. Thus for instance the metric on the real line in 
the previous example induces a metric space structure on any subset of 
the reals, such as the integers Z, or an interval [a,b], etc. 


Example 1.1.6 (Euclidean spaces). Let n > 1 be a natural number, 
and let R” be the space of n-tuples of real numbers: 


R” = {ei Basis 5 By) 2 L1,--6+,%n € R}. 


We define the Euclidean metric (also called the 1? metric) dp : R” x 
R” > R by 


Cpl (ise .05 By) Yiy ca 29es)) = V (21 yi)? Piet br Yn)? 
a i? 
(830m 
t= 1 


Thus for instance, if n = 2, then dj2((1,6), (4, 2)) = V3? + 42 = 5. This 
metric corresponds to the geometric distance between the two points 
(@1,22,---,;%n), (Y1,Y2,---,Yn) as given by Pythagoras’ theorem. (We 
remark however that while geometry does give some very important ex- 
amples of metric spaces, it is possible to have metric spaces which have 
no obvious geometry whatsoever. Some examples are given below.) The 
verification that (R”,d) is indeed a metric space can be seen geomet- 
rically (for instance, the triangle inequality now asserts that the length 
of one side of a triangle is always less than or equal to the sum of the 
lengths of the other two sides), but can also be proven algebraically (see 
Exercise 1.1.6). We refer to (R”,dj)2) as the Euclidean space of dimen- 
sion n. Extending the convention from Example 1.1.4, if we refer to R” 
as a metric space, we assume that the metric is given by the Euclidean 
metric unless otherwise specified. 


Example 1.1.7 (Taxi-cab metric). Again let n > 1, and let R” be 
as before. But now we use a different metric dj, the so-called taxicab 
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metric (or I! metric), defined by 


dys (21, £2,.--,¥n), (Yr, Y2,-++5Yn)) = [v1 — yl +... + [tn — yn 
n 
= 5° |e: - yi. 
t=1 


Thus for instance, if n = 2, then dj ((1,6), (4,2)) =3+4= 7. This 
metric is called the taxi-cab metric, because it models the distance a 
taxi-cab would have to traverse to get from one point to another if the 
cab was only allowed to move in cardinal directions (north, south, east, 
west) and not diagonally. As such it is always at least as large as the 
Euclidean metric, which measures distance “as the crow flies”, as it were. 
We claim that the space (R”, dj) is also a metric space (Exercise 1.1.7). 
The metrics are not quite the same, but we do have the inequalities 


dj2(x, y) < dy (x,y) < Vind, (x,y) (1.1) 
for all x, y (see Exercise 1.1.8). 


Remark 1.1.8. The taxi-cab metric is useful in several places, for in- 
stance in the theory of error correcting codes. A string of n binary digits 
can be thought of as an element of R”, for instance the binary string 
10010 can be thought of as the point (1,0,0,1,0) in R°. The taxi-cab 
distance between two binary strings is then the number of bits in the 
two strings which do not match, for instance dji(10010, 10101) = 3. The 
goal of error-correcting codes is to encode each piece of information (e.g., 
a letter of the alphabet) as a binary string in such a way that all the 
binary strings are as far away in the taxicab metric from each other as 
possible; this minimizes the chance that any distortion of the bits due 
to random noise can accidentally change one of the coded binary strings 
to another, and also maximizes the chance that any such distortion can 
be detected and correctly repaired. 


Example 1.1.9 (Sup norm metric). Again let n > 1, and let R” be as 
before. But now we use a different metric dj, the so-called sup norm 
metric (or 1° metric), defined by 


djoo((X1, £2, tee sends (Yi, Y2; tee »Yn)) = sup{ |x; =. yi :l<is n}. 


Thus for instance, if n = 2, then djo((1,6), (4,2)) = sup(3,4) = 4. The 
space (R”, dj) is also a metric space (Exercise 1.1.9), and is related to 
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the I? metric by the inequalities 


J djo(e,y) < dyao(e,y) < da(e,y) (1.2) 


Jn 
for all x, y (see Exercise 1.1.10). 


Remark 1.1.10. The /', /?, and [© metrics are special cases of the more 
general /? metrics, where p € [1, +00], but we will not discuss these more 
general metrics in this text. 


Example 1.1.11 (Discrete metric). Let X be an arbitrary set (finite or 
infinite), and define the discrete metric daise by setting ddisc(x, y) := 0 
when x = y, and dgisc(z,y) := 1 when x # y. Thus, in this metric, 
all points are equally far apart. The space (X, ddisc) is a metric space 
(Exercise 1.1.11). Thus every set X has at least one metric on it. 


Example 1.1.12 (Geodesics). (Informal) Let X be the sphere 
{(x, y, 2) € Rega? oh y? + = ae and let d((x, y, z), (ay, 2) be 
the length of the shortest curve in X which starts at (x,y,z) and ends 
at (a’,y’, 2’). (This curve turns out to be an arc of a great circle; we will 
not prove this here, as it requires calculus of variations, which is beyond 
the scope of this text.) This makes X into a metric space; the reader 
should be able to verify (without using any geometry of the sphere) that 
the triangle inequality is more or less automatic from the definition. 


Example 1.1.13 (Shortest paths). (Informal) Examples of metric 
spaces occur all the time in real life. For instance, X could be all 
the computers currently connected to the internet, and d(x,y) is the 
shortest number of connections it would take for a packet to travel from 
computer x to computer y; for instance, if x and y are not directly con- 
nected, but are both connected to z, then d(x,y) = 2. Assuming that 
all computers in the internet can ultimately be connected to all other 
computers (so that d(x, y) is always finite), then (X,d) is a metric space 
(why?). Games such as “six degrees of separation” are also taking place 
in a similar metric space (what is the space, and what is the metric, 
in this case?). Or, X could be a major city, and d(x,y) could be the 
shortest time it takes to drive from x to y (although this space might 
not satisfy axiom (c) in real life!). 


Now that we have metric spaces, we can define convergence in these 
spaces. 
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Definition 1.1.14 (Convergence of sequences in metric spaces). Let m 
be an integer, (X,d) be a metric space and let (x\”))°~,,, be a sequence 
of points in X (i.e., for every natural number n > m, we assume that 
a is an element of X). Let x be a point in X. We say that (2(™)%,, 
converges to x with respect to the metric d, if and only if the limit 
limpsoo d(x, x) exists and is equal to 0. In other words, (pl) \oes 
converges to x with respect to d if and only if for every ¢ > 0, there 
exists an N > m such that d(a), x) <e for alln > N. (Why are these 


two definitions equivalent’) 


Remark 1.1.15. In view of Lemma 1.1.1 we see that this definition gen- 
eralizes our existing notion of convergence of sequences of real numbers. 
In many cases, it is obvious what the metric d is, and so we shall often 
just say “(a™)° converges to x” instead of “(x"))°°,, converges to 
x with respect to the metric d” when there is no chance of confusion. 
We also sometimes write “«'") > x as n > oo” instead. 


Remark 1.1.16. There is nothing special about the superscript n in 
the above definition; it is a dummy variable. Saying that (2))°~,, 
converges to x is exactly the same statement as saying that Gee. 
converges to x, for example; and sometimes it is convenient to change 
superscripts, for instance if the variable n is already being used for some 
other purpose. Similarly, it is not necessary for the sequence 2‘ to 
be denoted using the superscript (n); the above definition is also valid 
for sequences x,,, or functions f(n), or indeed of any expression which 
depends on n and takes values in X. Finally, from Exercises 6.1.3, 6.1.4 
we see that the starting point m of the sequence is unimportant for the 
purposes of taking limits; if (a(™)9<,,, converges to x, then (a), 
also converges to x for any m! > m. 
Example 1.1.17. We work in the Euclidean space R? with the 
standard Euclidean metric dj. Let (@@) ees, denote the sequence 
a” := (1/n,1/n) in R?, ie., we are considering the sequence 
(1,1), (1/2, 1/2), (1/3, 1/3),.... Then this sequence converges to (0,0) 
with respect to the Euclidean metric dj2, since 


cael V2 
Jim, dj2(a*"’, (0,0)) = Jim, "2 Beer a Jim es 0. 


The sequence (x'"))°, also converges to (0,0) with respect to the taxi- 
cab metric dj, since 


i. a 2 
lim dy (a), (0,0)) = lm —+—= lim = =0. 


no nN n n>0 nN 
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Similarly the sequence converges to (0,0) in the sup norm metric dj 
(why?). However, the sequence (2"))°°, does not converge to (0,0) in 
the discrete metric dgjsc, since 


lim daisc(#”),(0,0)) = lim 1=10. 


n—->co n—->oco 


Thus the convergence of a sequence can depend on what metric one 


uses! . 


In the case of the above four metrics - Euclidean, taxi-cab, sup norm, 
and discrete - it is in fact rather easy to test for convergence. 


Proposition 1.1.18 (Equivalence of I’, 1?, 1°). Let R” be a Euclidean 
space, and let a. be a sequence of points in R”. We write x") = 
(a) oh). ah), 1.e., for j = 1,2,...,n, x) € R is the j*” co- 
ordinate of «*) ER”. Let x = (a1,...,a@n) be a point in R”. Then the 


following four statements are equivalent: 


Cx) saan converges to x with respect to the Euclidean metric dj). 


k=m 


(a\*) 0 converges to x with respect to the sup norm metric djoo. 


) 

) (a) converges to x with respect to the taxi-cab metric dy. 
) k=m 

) 


For every1 <j <n, the sequence (eo )\2 converges to x;. 
(Notice that this is a sequence of real numbers, not of points in 


R".) 


Proof. See Exercise 1.1.12. 


In other words, a sequence converges in the Euclidean, taxi-cab, 
or sup norm metric if and only if each of its components converges 
individually. Because of the equivalence of (a), (b) and (c), we say that 
the Euclidean, taxicab, and sup norm metrics on R” are equivalent. 
(There are infinite-dimensional analogues of the Euclidean, taxicab, and 
sup norm metrics which are not equivalent, see for instance Exercise 
1.1.15.) 


‘For a somewhat whimsical real-life example, one can give a city an “automobile 
metric”, with d(x, y) defined as the time it takes for a car to drive from x to y, or a 
“pedestrian metric”, where d(x, y) is the time it takes to walk on foot from zx to y. 
(Let us assume for sake of argument that these metrics are symmetric, though this is 
not always the case in real life.) One can easily imagine examples where two points 
are close in one metric but not another. 
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For the discrete metric, convergence is much rarer: the sequence 
must be eventually constant in order to converge. 


Proposition 1.1.19 (Convergence in the discrete metric). Let X be 
any set, and let dais. be the discrete metric on X. Let (ar) )oo_ be 
a sequence of points in X, and let x be a point in X. Then (1 )oo_ 
converges to x with respect to the discrete metric dgisc if and only if there 


exists an N >m such that a) =x for alln>WN. 


Proof. See Exercise 1.1.13. 


We now prove a basic fact about converging sequences; they can only 
converge to at most one point at a time. 


Proposition 1.1.20 (Uniqueness of limits). Let (X,d) be a metric 
space, and let (7), be a sequence in X. Suppose that there are 
two points x,a! € X such that (x\™)2,, converges to x with respect to 
d, and (a))oo_ also converges to x’ with respect to d. Then we have 
ees oe 


Proof. See Exercise 1.1.14. 


Because of the above Proposition, it is safe to introduce the following 
notation: if («”)°,, converges to x in the metric d, then we write 
d — limn-soo gi) = x, or simply limp—+oo a) = x when there is no 
confusion as to what d is. For instance, in the example of (4,+), we 


non 
have 
Ted 11 
delim (<=) =dn — lim (<.+) = (0,0), 
nowl\n n nwo \n nN 


but ddisc — limp—oo(4, +) is undefined. Thus the meaning of d — 
limy—o0 x can depend on what d is; however Proposition 1.1.20 assures 
us that once d is fixed, there can be at most one value of d—limy_yo 2. 
(Of course, it is still possible that this limit does not exist; some se- 
quences are not convergent.) Note that by Lemma 1.1.1, this definition 
of limit generalizes the notion of limit in Definition 6.1.8. 


Remark 1.1.21. It is possible for a sequence to converge to one point 
using one metric, and another point using a different metric, although 
such examples are usually quite artificial. For instance, let X := [0,1], 
the closed interval from 0 to 1. Using the usual metric d, we have 
d — limn +o 2 = 0. But now suppose we “swap” the points 0 and 1 in 
the following manner. Let f : [0,1] — [0,1] be the function defined by 
f(0) := 1, f(1) := 0, and f(x) := = for all x € (0,1), and then define 
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d' (x,y) := d(f(x), f(y)). Then (X,d’) is still a metric space (why?), but 
now d! —limn_+o0 + = 1. Thus changing the metric on a space can greatly 
affect the nature of convergence (also called the topology) on that space; 
see Section 2.5 for a further discussion of topology. 


— Exercises — 
Exercise 1.1.1. Prove Lemma 1.1.1. 


Exercise 1.1.2. Show that the real line with the metric d(x, y) := |x—y| is indeed 
a metric space. (Hint: you may wish to review your proof of Proposition 4.3.3.) 


Exercise 1.1.3. Let X be a set, and let d: X x X — [0,00) be a function. 
(a) Give an example of a pair (X,d) which obeys axioms (bcd) of Definition 
1.1.2, but not (a). (Hint: modify the discrete metric.) 


(b) Give an example of a pair (X,d) which obeys axioms (acd) of Definition 
1.1.2, but not (b). 


(c) Give an example of a pair (X,d) which obeys axioms (abd) of Definition 
1.1.2, but not (c). 


(d) Give an example of a pair (X,d) which obeys axioms (abc) of Definition 
1.1.2, but not (d). (Hint: try examples where X is a finite set.) 


Exercise 1.1.4. Show that the pair (Y, dy xy) defined in Example 1.1.5 is indeed 
a metric space. 


Exercise 1.1.5. Let n > 1, and let aj, a2,...,@,, and bj, b2,...,b, be real num- 
bers. Verify the identity 


n 2 n n n n 
bs ot + 5 laity —ajb;)? = «) Soe ; 
i=l j= 


and conclude the Cauchy-Schwarz inequality 


1/2 


n 1/2 n 
< (>: «) S005 ; (1.3) 
i=l j=l 


Then use the Cauchy-Schwarz inequality to prove the triangle inequality 


n 
y aid; 
i=l 


1/2 


ny 1/2 Pa 1/2 n 
(se +0) < (>: «) an yee 
i=l j=l 


i=l 


Exercise 1.1.6. Show that (R”, djz) in Example 1.1.6 is indeed a metric space. 
(Hint: use Exercise 1.1.5.) 


10 1. Metric spaces 


Exercise 1.1.7. Show that the pair (R”, dj) in Example 1.1.7 is indeed a metric 
space. 


Exercise 1.1.8. Prove the two inequalities in (1.1). (For the first inequality, 
square both sides. For the second inequality, use Exercise (1.1.5). 


Exercise 1.1.9. Show that the pair (R”,d)~<) in Example 1.1.9 is indeed a 
metric space. 


Exercise 1.1.10. Prove the two inequalities in (1.2). 


Exercise 1.1.11. Show that the discrete metric (R”, daisc) in Example 1.1.11 is 
indeed a metric space. 


Exercise 1.1.12. Prove Proposition 1.1.18. 
Exercise 1.1.13. Prove Proposition 1.1.19. 


Exercise 1.1.14. Prove Proposition 1.1.20. (Hint: modify the proof of Propo- 
sition 6.1.7.) 


Exercise 1.1.15. Let 


X= {oni : y lan| < ~| 


n=0 


be the space of absolutely convergent sequences. Define the J+ and [°° metrics 
on this space by 


co 


Sy lan — byl; 


n=0 


l 


dy ((Gn)n=-o> (n)R=0) : 


dice ((an) Ro» (bn no) := sup |an — by. 


nEeN 


l 


Show that these are both metrics on X, but show that there exist sequences 
a), 2), ... of elements of X (i.e., sequences of sequences) which are conver- 
gent with respect to the dj. metric but not with respect to the dj metric. 
Conversely, show that any sequence which converges in the dj metric auto- 
matically converges in the dj metric. 

Exercise 1.1.16. Let (x,)?@, and (y,)°, be two sequences in a metric space 
(X,d). Suppose that (x,)°2, converges to a point « € X, and (y,)°2) con- 
verges to a point y € X. Show that limp. d(tn, yn) = d(x,y). (Hint: use 
the triangle inequality several times.) 


1.2 Some point-set topology of metric spaces 


Having defined the operation of convergence on metric spaces, we now 
define a couple other related notions, including that of open set, closed 
set, interior, exterior, boundary, and adherent point. The study of such 
notions is known as point-set topology, which we shall return to in Sec- 
tion 2.5. 
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We first need the notion of a metric ball, or more simply a ball. 


Definition 1.2.1 (Balls). Let (X,d) be a metric space, let xo be a point 
in X, and let r > 0. We define the ball B(x ,a)(vo,7r) in X, centered at 
xo, and with radius r, in the metric d, to be the set 


Bx,a) (£0, 7) = {z € X : d(x, x0) <r}. 


When it is clear what the metric space (X,d) is, we shall abbreviate 
Bcx,a)(£0,7) as just B(xo,1). 


Example 1.2.2. In R? with the Euclidean metric dp, the ball 
Ber? d,2)((0, 0), 1) is the open disc 


Ber2da)((0,0),1) = {(@,y) € R? sa? +4? < 1}. 


However, if one uses the taxi-cab metric dj instead, then we obtain a 
diamond: 


Ber2d,)((0,0), 1) = {(#,y) € R? : |a| + |y| < 1}. 
If we use the discrete metric, the ball is now reduced to a single point: 


BR? daice) ((; 0), 1) = {(0, 0)}, 


although if one increases the radius to be larger than 1, then the ball 
now encompasses all of R?. (Why?) 


Example 1.2.3. In R with the usual metric d, the open interval (3, 7) 
is also the metric ball Byr,a)(5, 2). 


Remark 1.2.4. Note that the smaller the radius r, the smaller the ball 
B(xo,r). However, B(xo,7) always contains at least one point, namely 
the center xo, as long as r stays positive, thanks to Definition 1.1.2(a). 
(We don’t consider balls of zero radius or negative radius since they are 
rather boring, being just the empty set.) 


Using metric balls, one can now take a set EF in a metric space X, 
and classify three types of points in X: interior, exterior, and boundary 
points of EF. 


Definition 1.2.5 (Interior, exterior, boundary). Let (X,d) be a metric 
space, let EF be a subset of X, and let x2 be a point in X. We say 
that xo is an interior point of E if there exists a radius r > 0 such that 
B(xo,r) C E. We say that xo is an exterior point of E if there exists a 
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radius r > 0 such that B(xo,r) 1 E =. We say that xo is a boundary 
point of E if it is neither an interior point nor an exterior point of E. 


The set of all interior points of F is called the interior of E and is 
sometimes denoted int(£). The set of exterior points of F is called the 
exterior of E and is sometimes denoted ext(E£). The set of boundary 
points of F is called the boundary of E and is sometimes denoted OE. 


Remark 1.2.6. If xo is an interior point of EF, then x9 must actually 
be an element of EF, since balls B(xo,r) always contain their center zo. 
Conversely, if x9 is an exterior point of FE, then xp cannot be an element 
of E. In particular it is not possible for xo to simultaneously be an 
interior and an exterior point of E. If 29 is a boundary point of F, then 
it could be an element of EF, but it could also not lie in E; we give some 
examples below. 


Example 1.2.7. We work on the real line R with the standard metric 
d. Let E be the half-open interval E = [1,2). The point 1.5 is an interior 
point of FE, since one can find a ball (for instance B(1.5,0.1)) centered 
at 1.5 which lies in E. The point 3 is an exterior point of EF, since one 
can find a ball (for instance B(3,0.1)) centered at 3 which is disjoint 
from E. The points 1 and 2 however, are neither interior points nor 
exterior points of F, and are thus boundary points of E. Thus in this 
case int(E£) = (1,2), ext(£) = (—oo, 1) U (2,00), and OE = {1,2}. Note 
that in this case one of the boundary points is an element of FE, while 
the other is not. 


Example 1.2.8. When we give a set X the discrete metric dgisc, and 
E is any subset of X, then every element of F is an interior point of EF, 
every point not contained in F is an exterior point of E, and there are 
no boundary points; see Exercise 1.2.1. 


Definition 1.2.9 (Closure). Let (X,d) be a metric space, let E be a 
subset of X, and let xg be a point in X. We say that xo is an adherent 
point of E if for every radius r > 0, the ball B(xo,7r) has a non-empty 
intersection with EF. The set of all adherent points of E is called the 
closure of E and is denoted E. 


Note that these notions are consistent with the corresponding notions 
on the real line defined in Definitions 9.1.8, 9.1.10 (why’). 

The following proposition links the notions of adherent point with 
interior and boundary point, and also to that of convergence. 
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Proposition 1.2.10. Let (X,d) be a metric space, let E be a subset 
of X, and let xg be a point in X. Then the following statements are 
logically equivalent. 


(a) x9 is an adherent point of E. 
(b) xo is either an interior point or a boundary point of E. 


(c) There exists a sequence (%p)°°, in E which converges to xo with 
respect to the metric d. 


Proof. See Exercise 1.2.2. 


From the equivalence of Proposition 1.2.10(a) and (b) we obtain an 
immediate corollary: 


Corollary 1.2.11. Let (X,d) be a metric space, and let E be a subset 
of X. Then E = int(E) VOE = X\ext(£). 


As remarked earlier, the boundary of a set & may or may not lie in 
E. Depending on how the boundary is situated, we may call a set open, 
closed, or neither: 


Definition 1.2.12 (Open and closed sets). Let (X,d) be a metric space, 
and let EF be a subset of X. We say that EF is closed if it contains all of 
its boundary points, i.e., OF C E. We say that E is open if it contains 
none of its boundary points, i.e, 0EN E =. If E contains some of its 
boundary points but not others, then it is neither open nor closed. 


Example 1.2.13. We work in the real line R with the standard metric 
d. The set (1,2) does not contain either of its boundary points 1, 2 and 
is hence open. The set [1,2] contains both of its boundary points 1, 2 
and is hence closed. The set [1,2) contains one of its boundary points 
1, but does not contain the other boundary point 2, so is neither open 
nor closed. 


Remark 1.2.14. It is possible for a set to be simultaneously open and 
closed, if it has no boundary. For instance, in a metric space (X,d), the 
whole space X has no boundary (every point in X is an interior point 
- why?), and so X is both open and closed. The empty set @ also has 
no boundary (every point in X is an exterior point - why?), and so_ is 
both open and closed. In many cases these are the only sets that are 
simultaneously open and closed, but there are exceptions. For instance, 
using the discrete metric dgisc, every set is both open and closed! (why?) 
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From the above two remarks we see that the notions of being open 
and being closed are not negations of each other; there are sets that 
are both open and closed, and there are sets which are neither open nor 
closed. Thus, if one knew for instance that E was not an open set, it 
would be erroneous to conclude from this that FE’ was a closed set, and 
similarly with the roles of open and closed reversed. The correct rela- 
tionship between open and closed sets is given by Proposition 1.2.15(e) 
below. 

Now we list some more properties of open and closed sets. 


Proposition 1.2.15 (Basic properties of open and closed sets). Let 
(X,d) be a metric space. 


(a) Let E be a subset of X. Then E is open if and only if E = int(F). 
In other words, E is open if and only if for every x € E, there 
exists anr > 0 such that B(a,r) C E. 


(b) Let E be a subset of X. Then E is closed if and only if E contains 
all its adherent points. In other words, E is closed if and only if 
for every convergent sequence (%n)r~,, in E, the limit limy +. &n 
of that sequence also lies in E. 


(c) For any to € X andr > 0, then the ball B(xo,r) is an open 
set. The set {x € X : d(x,2o) < r} is a closed set. (This set is 
sometimes called the closed ball of radius r centered at xo.) 


(d) Any singleton set {xo}, where xp € X, is automatically closed. 


(e) If E is a subset of X, then E is open if and only if the complement 
X\E :={c# EX :2 ¢ E} is closed. 


(f) If F1,...,En are a finite collection of open sets in X, then Fy MO 
FoN...N Ey is also open. If Fy,..., Fy is a finite collection of 
closed sets in X, then Fy UF)U...UF), is also closed. 


(g) If {Ea}aer is a collection of open sets in X (where the index 
set I could be finite, countable, or uncountable), then the union 
User Ha = {@ € X : x € Ey for some a € I} is also open. If 
{Fuhaer 1s a collection of closed sets in X, then the intersection 
oer fa := {2 €X:a € Fy for alla € I} is also closed. 


(h) If E is any subset of X, then int(E) is the largest open set which 
is contained in E; in other words, int(E) is open, and given any 
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other open set V C E, we have V C int(E). Similarly E is the 
smallest closed set which contains E; in other words, E is closed, 
and given any other closed set K D E, KD E. 


Proof. See Exercise 1.2.3. 


— Exercises — 
Exercise 1.2.1. Verify the claims in Example 1.2.8. 


Exercise 1.2.2. Prove Proposition 1.2.10. (Hint: for some of the implications 
one will need the axiom of choice, as in Lemma 8.4.5.) 


Exercise 1.2.3. Prove Proposition 1.2.15. (Hint: you can use earlier parts of 
the proposition to prove later ones.) 


Exercise 1.2.4. Let (X,d) be a metric space, zo be a point in X, and r > 0. 


Let B be the open ball B := B(ao,r) = {x € X : d(x,x0) < r}, and let C be 
the closed ball C := {x € X : d(x,x%9) <r}. 


(a) Show that BCC. 


(b) Give an example of a metric space (X,d), a point xo, and a radius r > 0 
such that B is not equal to C. 


1.3. Relative topology 


When we defined notions such as open and closed sets, we mentioned 
that such concepts depended on the choice of metric one uses. For 
instance, on the real line R, if one uses the usual metric d(x, y) = |x—y|, 
then the set {1} is not open, however if instead one uses the discrete 
metric dgisc, then {1} is now an open set (why?). 

However, it is not just the choice of metric which determines what 
is open and what is not - it is also the choice of ambient space X. Here 
are some examples. 


Example 1.3.1. Consider the plane R? with the Euclidean metric d)2. 
Inside the plane, we can find the x-axis X := {(z,0) : « € R}. The 
metric dj can be restricted to X, creating a subspace (X,dj2|xxx) of 
(R?,dj2). (This subspace is essentially the same as the real line (R, d) 
with the usual metric; the precise way of stating this is that (X, dj2|xx.x) 
is isometric to (R,d). We will not pursue this concept further in this 
text, however.) Now consider the set 


Be (eae 
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which is both a subset of X and of R?. Viewed as a subset of R?, it is not 
open, because the point 0, for instance, lies in E but is not an interior 
point of &. (Any ball B24, (0,r) will contain at least one point that 
lies outside of the x-axis, and hence outside of E. On the other hand, if 
viewed as a subset of X, it is open; every point of F is an interior point 
of E with respect to the metric space (X,dj2|xxx). For instance, the 
point 0 is now an interior point of E, because the ball Bx dalxyx (9s 1) 
is contained in F (in fact, in this case it is E.) 


Example 1.3.2. Consider the real line R with the standard metric d, 
and let X be the interval X := (—1,1) contained inside R; we can then 
restrict the metric d to X, creating a subspace (X,d|x xx) of (R,d). 
Now consider the set [0,1). This set is not closed in R, because the 
point 1 is adherent to [0,1) but is not contained in [0,1). However, 
when considered as a subset of X, the set [0,1) now becomes closed; the 
point 1 is not an element of X and so is no longer considered an adherent 
point of [0,1), and so now [0, 1) contains all of its adherent points. 


To clarify this distinction, we make a definition. 


Definition 1.3.3 (Relative topology). Let (X,d) be a metric space, 
let Y be a subset of X, and let E be a subset of Y. We say that E 
is relatively open with respect to Y if it is open in the metric subspace 
(Y,dlyxy). Similarly, we say that F is relatively closed with respect to 
Y if it is closed in the metric space (Y,dlyxy). 


The relationship between open (or closed) sets in X, and relatively 
open (or relatively closed) sets in Y, is the following. 


Proposition 1.3.4. Let (X,d) be a metric space, let Y be a subset of 
X, and let E be a subset of Y. 


(a) E is relatively open with respect to Y if and only if E=VOY for 
some set V C X which is open in X. 


(b) E is relatively closed with respect to Y if and only if BE = KNY 
for some set K C X which is closed in X. 


Proof. We just prove (a), and leave (b) to Exercise 1.3.1. First suppose 
that E is relatively open with respect to Y. Then, F is open in the 
metric space (Y,dlyxy). Thus, for every x € EF, there exists a radius 
r > 0 such that the ball Byyay,..,)(£,7) is contained in FE. This radius r 
depends on zx; to emphasize this we write r, instead of r, thus for every 
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x € E the ball Bya,,,)(@, rx) is contained in FE. (Note that we have 
used the axiom of choice, Proposition 8.4.7, to do this.) 
Now consider the set 


V = LJ Baa(2,7e). 
«zeke 


This is a subset of X. By Proposition 1.2.15(c) and (g), V is open. Now 
we prove that fF = VY. Certainly any point x in — lies in VOY, 
since it lies in Y and it also lies in Byx,q)(v,r«), and hence in V. Now 
suppose that y is a point in VM Y. Then y € V, which implies that 
there exists an x € E' such that y € Bix,q)(7, rx). But since y is also in 
Y, this implies that y € Byyay,.s)(@, Tx). But by definition of rz, this 
means that y € E, as desired. Thus we have found an open set V for 
which FE = VY as desired. 

Now we do the converse. Suppose that E = VMY for some open set 
V; we have to show that EF is relatively open with respect to Y. Let x 
be any point in £; we have to show that x is an interior point of & in 
the metric space (Y,dlyxy). Since x € E, we know x € V. Since V is 
open in X, we know that there is a radius r > 0 such that Bix ,a)(x,r) 
is contained in V. Strictly speaking, r depends on x, and so we could 
write r, instead of r, but for this argument we will only use a single 
choice of x (as opposed to the argument in the previous paragraph) and 
so we will not bother to subscript r here. Since EF = VY, this means 
that Byxa)(z,r) OY is contained in E. But Byx,a(x,r) NY is exactly 
the same as Bry.aly,-)(@,7) (why?), and so Byyajy,.y)(@, 1) is contained 
in E. Thus z is an interior point of E in the metric space (Y,dlyxy), 
as desired. 


— Exercises — 


Exercise 1.3.1. Prove Proposition 1.3.4(b). 


1.4 Cauchy sequences and complete metric spaces 


We now generalize much of the theory of limits of sequences from Chap- 
ter 6 to the setting of general metric spaces. We begin by generalizing 
the notion of a subsequence from Definition 6.6.1: 


Definition 1.4.1 (Subsequences). Suppose that (a”))°~,, is a sequence 
of points in a metric space (X,d). Suppose that n1,n2,n3,... is an 
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increasing sequence of integers which are at least as large as m, thus 
mon <nea<ng<.... 


Then we call the sequence (x's) 20, a subsequence of the original se- 
quence (a("))°2_. 


Examples 1.4.2. the sequence ((j2, 2 ))§21 in R? is a subsequence 
of the sequence ((4,+))°, (in this case, nj := j?). The sequence 


1,1,1,1,... is a subsequence of 1,0,1,0,1,.... 


(oe) 


If a sequence converges, then so do all of its subsequences: 


Lemma 1.4.3. Let (x), be a sequence in (X,d) which converges 
to some limit xo. Then every subsequence (x (rj) 90 ©, of that sequence 
also converges to Xo. 


Proof. See Exercise 1.4.3. 


On the other hand, it is possible for a subsequence to be conver- 
gent without the sequence as a whole being convergent. For example, 
the sequence 1,0,1,0,1,...is not convergent, even though certain subse- 
quences of it (such as 1,1,1,...) converge. To quantify this phenomenon, 
we generalize Definition 6.4.1 as follows: 


Definition 1.4.4 (Limit points). Suppose that (2'"))°,,, is a sequence 
of points in a metric space (X,d), and let L € X. We say that Lisa 
limit point of (x\™))2°_,, iff for every N > m and € > 0 there exists an 
n> N such that d(x, L) < e. 


Proposition 1.4.5. Let (x\)°%,,, be a sequence of points in a metric 
space (X,d), and let L € X. Then the following are equivalent: 


e L is a limit point of (aor ner 


e There exists a subsequence (x("3) yee ©, of the original sequence 
(a)°°_ which converges to L. 


Proof. See Exercise 1.4.2. 


Next, we review the notion of a Cauchy sequence from Definition 
6.1.3 (see also Definition 5.1.8). 
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Definition 1.4.6 (Cauchy sequences). Let (a), be a sequence 
of points in a metric space (X,d). We say that this sequence is a 
Cauchy sequence iff for every ¢ > 0, there exists an N > m such that 
d(x), 2) < € for all j,k > N. 


Lemma 1.4.7 (Convergent sequences are Cauchy sequences). Let 
(2) be a sequence in (X,d) which converges to some limit xo. 
Then (a), is also a Cauchy sequence. 


Proof. See Exercise 1.4.3. 


It is also easy to check that subsequence of a Cauchy sequence is 
also a Cauchy sequence (why)? However, not every Cauchy sequence 
converges: 


Example 1.4.8. (Informal) Consider the sequence 
5 SS id SAA e415. 


in the metric space (Q,d) (the rationals Q with the usual metric 
d(x,y) := |x — y|). While this sequence is convergent in R (it con- 
verges to 7), it does not converge in Q (since 7 ¢ Q, and a sequence 
cannot converge to two different limits). 


So in certain metric spaces, Cauchy sequences do not necessarily 
converge. However, if even part of a Cauchy sequence converges, then 
the entire Cauchy sequence must converge (to the same limit): 


Lemma 1.4.9. Let (a‘"))°,, be a Cauchy sequence in (X,d). Suppose 
that there is some subsequence (a'"s))20 5 of this sequence which con- 


verges to a limit ay in X. Then the original sequence (x\"))°°,,, also 
converges to Xo. 
Proof. See Exercise 1.4.4. Oo 


In Example 1.4.8 we saw an example of a metric space which con- 
tained Cauchy sequences which did not converge. However, in Theorem 
6.4.18 we saw that in the metric space (R,d), every Cauchy sequence 
did have a limit. This motivates the following definition. 


Definition 1.4.10 (Complete metric spaces). A metric space (X, d) 
is said to be complete iff every Cauchy sequence in (X,d) is in fact 
convergent in (X,d). 


20 1. Metric spaces 


Example 1.4.11. By Theorem 6.4.18, the reals (R, d) are complete; by 
Example 1.4.8, the rationals (Q, d), on the other hand, are not complete. 


Complete metric spaces have some nice properties. For instance, 
they are intrinsically closed: no matter what space one places them in, 
they are always closed sets. More precisely: 


Proposition 1.4.12. (a) Let (X,d) be a metric space, and let 
(Y,dlyxy) be a subspace of (X,d). If (Y,d|yxy) is complete, then 
Y must be closed in X. 


(b) Conversely, suppose that (X,d) is a complete metric space, and 
Y is a closed subset of X. Then the subspace (Y,dlyxy) is also 
complete. 


Proof. See Exercise 1.4.7. 


In contrast, an incomplete metric space such as (Q,d) may be con- 
sidered closed in some spaces (for instance, Q is closed in Q) but not 
in others (for instance, Q is not closed in R). Indeed, it turns out 
that given any incomplete metric space (X,d), there exists a completion 
(X,d), which is a larger metric space containing (X,d) which is com- 
plete, and such that X is not closed in X (indeed, the closure of X in 
(X,d) will be all of X); see Exercise 1.4.8. For instance, one possible 
completion of Q is R. 


— Exercises — 


Exercise 1.4.1. Prove Lemma 1.4.3. (Hint: review your proof of Proposition 
6.6.5.) 


Exercise 1.4.2. Prove Proposition 1.4.5. (Hint: review your proof of Proposi- 
tion 6.6.6.) 


Exercise 1.4.3. Prove Lemma 1.4.7. (Hint: review your proof of Proposition 
6.1.12.) 


Exercise 1.4.4. Prove Lemma 1.4.9. 


Exercise 1.4.5. Let (a'™)°2,,, be a sequence of points in a metric space (X, d), 
and let L € X. Show that if L is a limit point of the sequence (#))%,,,, then 
L is an adherent point of the set {2 :n > m}. Is the converse true? 

Exercise 1.4.6. Show that every Cauchy sequence can have at most one limit 


point. 


Exercise 1.4.7. Prove Proposition 1.4.12. 
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Exercise 1.4.8. The following construction generalizes the construction of the 
reals from the rationals in Chapter 5, allowing one to view any metric space 
as a subspace of a complete metric space. In what follows we let (X,d) be a 
metric space. 


(a) Given any Cauchy sequence (x,,)°2, in X, we introduce the formal limit 
LIMy-+00 Ln. We say that two formal limits LIMy-,.5 Gy and LIMn_-+6 yn 
are equal if limp... d(an, Yn) is equal to zero. Show that this equality 
relation obeys the reflexive, symmetry, and transitive axioms. 


(b) Let X be the space of all formal limits of Cauchy sequences in X, with 
the above equality relation. Define a metric dy : X x X + Rt by setting 


dx(LIMn-+00 tn, LIMn-+00 Yn) 1= lim A(Ln, Yn). 


Show that this function is well-defined (this means not only that the 
limit limp+o d(@n, Yn) exists, but also that the axiom of substitution is 
obeyed; cf. Lemma 5.3.7), and gives X the structure of a metric space. 


(c) Show that the metric space (X, dx) is complete. 


(d) We identify an element « € X with the corresponding formal limit 
LIM, 3.0.2 in X; show that this is legitimate by verifying that 2 = 
y <= LIM,>~.2 = LIMyn>~y. With this identification, show that 
d(x,y) = dx(x,y), and thus (X,d) can now be thought of as a subspace 
of eg dx). 

(ec) Show that the closure of X in X is X (which explains the choice of 
notation X). 


(f) Show that the formal limit agrees with the actual limit, thus if (a,)%° 


n=1 
is any Cauchy sequence in X, then we have limyn.6 2%, = LIMn+0 tn 
in X. 


1.5 Compact metric spaces 


We now come to one of the most useful notions in point set topology, 
that of compactness. Recall the Heine-Borel theorem (Theorem 9.1.24), 
which asserted that every sequence in a closed and bounded subset X 
of the real line R had a convergent subsequence whose limit was also 
in X. Conversely, only the closed and bounded sets have this property. 
This property turns out to be so useful that we give it a name. 


Definition 1.5.1 (Compactness). A metric space (X,d) is said to be 
compact iff every sequence in (X,d) has at least one convergent subse- 
quence. A subset Y of a metric space X is said to be compact if the 
subspace (Y,dlyxy) is compact. 
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Remark 1.5.2. The notion of a set Y being compact is intrinsic, in 
the sense that it only depends on the metric function dly,y restricted 
to Y, and not on the choice of the ambient space X. The notions of 
completeness in Definition 1.4.10, and of boundedness below in Defini- 
tion 1.5.3, are also intrinsic, but the notions of open and closed are not 
(see the discussion in Section 1.3). 


Thus, Theorem 9.1.24 shows that in the real line R with the usual 
metric, every closed and bounded set is compact, and conversely every 
compact set is closed and bounded. 

Now we investigate how the Heine-Borel extends to other metric 
spaces. 


Definition 1.5.3 (Bounded sets). Let (X,d) be a metric space, and let 
Y be a subset of X. We say that Y is bounded iff there exists a ball 
B(ax,r) in X which contains Y. 


Remark 1.5.4. This definition is compatible with the definition of a 
bounded set in Definition 9.1.22 (Exercise 1.5.1). 


Proposition 1.5.5. Let (X,d) be a compact metric space. Then (X, d) 
is both complete and bounded. 


Proof. See Exercise 1.5.2. 


From this proposition and Proposition 1.4.12(a) we obtain one half 
of the Heine-Borel theorem for general metric spaces: 


Corollary 1.5.6 (Compact sets are closed and bounded). Let (X,d) be 
a metric space, and let Y be a compact subset of X. Then Y is closed 
and bounded. 


The other half of the Heine-Borel theorem is true in Euclidean spaces: 


Theorem 1.5.7 (Heine-Borel theorem). Let (R”",d) be a Euclidean 
space with either the Euclidean metric, the taxicab metric, or the sup 
norm metric. Let E be a subset of R”. Then E is compact if and only 
if it is closed and bounded. 


Proof. See Exercise 1.5.3. 


However, the Heine-Borel theorem is not true for more general met- 
rics. For instance, the integers Z with the discrete metric is closed (in- 
deed, it is complete) and bounded, but not compact, since the sequence 
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1,2,3,4,...is in Z but has no convergent subsequence (why?). Another 
example is in Exercise 1.5.8. However, a version of the Heine-Borel the- 
orem is available if one is willing to replace closedness with the stronger 
notion of completeness, and boundedness with the stronger notion of 
total boundedness; see Exercise 1.5.10. 

One can characterize compactness topologically via the following, 
rather strange-sounding statement: every open cover of a compact set 
has a finite subcover. 


Theorem 1.5.8. Let (X,d) be a metric space, and let Y be a compact 
subset of X. Let (Va)aer be a collection of open sets in X, and suppose 
that 
We Vas 
ael 


(i.e., the collection (Va)aer covers Y). Then there exists a finite subset 
F of I such that 


Yc iy Ve. 
ack 


Proof. We assume for sake of contradiction that there does not exist any 
finite subset F of A for which Y C Unep Va- 

Let y be any element of Y. Then y must lie in at least one of the 
sets V,. Since each V, is open, there must therefore be an r > 0 such 
that Byx.a(y,r) C Va. Now let r(y) denote the quantity 


r(y) :=sup{r € (0,00) : Bcx,a)(y,7) © Va for some a € A}. 


By the above discussion, we know that r(y) > 0 for all y € Y. Now, let 
ro denote the quantity 


ro :=inf{r(y): ye Y}. 


Since r(y) > 0 for all y € Y, we have ro > 0. There are two cases: 
ro = 0 and 79 > 0. 


e Case 1: ro = 0. Then for every integer n > 1, there is at least 
one point y in Y such that r(y) < 1/n (why?). We thus choose, 
for each n > 1, a point y™ in Y such that r(y™) < 1/n (we 
can do this because of the axiom of choice, see Proposition 8.4.7). 
In particular we have limp... r(y™) = 0, by the squeeze test. 
The sequence (yo, is a sequence in Y; since Y is compact, we 
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can thus find a subsequence (y("s 92.4 which converges to a point 
yo ec Y. 


As before, we know that there exists some a € A such that yo € Va, 
and hence (since V4 is open) there exists some ¢ > 0 such that 
B(yo,€) C Va. Since y"3) converges to yo, there must exist an 
N > 1 such that y's) € B(yo,¢/2) for all n > N. In particular, 
by the triangle inequality we have B(y'"s),</2) C B(yo,e), and 
thus B(y'"s),</2) C Va. By definition of r(y"%)), this implies 
that r(y("4)) > ¢/2 for alln > N. But this contradicts the fact 
that limp_—soo r(y) = 0. 


e Case 2: ro > 0. In this case we now have r(y) > ro/2 for all 
y € Y. This implies that for every y € Y there exists ana € A 
such that B(y, 79/2) € Va (why?). 


We now construct a sequence y“), y),... by the following re- 
cursive procedure. We let y“ be any point in Y. The ball 
B(y™, ro /2) is contained in one of the Va and thus cannot cover 
all of Y, since we would then obtain a finite cover, a ee 
Thus there exists a point y) which does not lie in B(y™,r9/2), 
so in particular d(y), y) > r9/2. Choose such a point y? ). The 
set B(y, 19/2) U B(y®,r9/2) cannot cover all of Y, since we 
would then obtain two sets V,, and Va, which covered Y, a con- 
are again. So He can choose a point y) which does not lie 
in B(y, r9/2) U B(y®), r9/2), so in particular d(y®), y®) > ro /2 
and d(y®), y)) > ro/2. Continuing in this fashion we obtain a 
sequence (y”))°°, in Y with the property that d(y), y) > ro/2 
for all k > j. In particular the sequence (y'” Dia is not a Cauchy 
sequence, and in fact no subsequence of (yr, can be a Cauchy 
sequence either. But this contradicts the assumption that Y is 
compact (by Lemma 1.4.7). 


It turns out that Theorem 1.5.8 has a converse: if Y has the property 
that every open cover has a finite sub-cover, then it is compact (Exercise 
1.5.11). In fact, this property is often considered the more fundamental 
notion of compactness than the sequence-based one. (For metric spaces, 
the two notions, that of compactness and sequential compactness, are 
equivalent, but for more general topological spaces, the two notions are 
slightly different; see Exercise 2.5.8.) 
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Theorem 1.5.8 has an important corollary: that every nested se- 
quence of non-empty compact sets is still non-empty. 


Corollary 1.5.9. Let (X,d) be a metric space, and let Kk, K2, K3,... 
be a sequence of non-empty compact subsets of X such that 


Ki, > Ko>K3>D.... 


Then the intersection (\7~_, Kn is non-empty. 


Proof. See Exercise 1.5.6. 


We close this section by listing some miscellaneous properties of com- 
pact sets. 


Theorem 1.5.10. Let (X,d) be a metric space. 


(a) If Y is a compact subset of X, and Z CY, then Z is compact if 
and only if Z is closed. 


(b) If Y1,...,¥n are a finite collection of compact subsets of X, then 
their union Y; U...UY,p, is also compact. 


(c) Every finite subset of X (including the empty set) is compact. 


Proof. See Exercise 1.5.7. 


— Exercises — 


Exercise 1.5.1. Show that Definitions 9.1.22 and 1.5.3 match when talking 
about subsets of the real line with the standard metric. 


Exercise 1.5.2. Prove Proposition 1.5.5. (Hint: prove the completeness and 
boundedness separately. For both claims, use proof by contradiction. You will 
need the axiom of choice, as in Lemma 8.4.5.) 


Exercise 1.5.3. Prove Theorem 1.5.7. (Hint: use Proposition 1.1.18 and Theo- 
rem 9.1.24.) 


Exercise 1.5.4. Let (R,d) be the real line with the standard metric. Give an 
example of a continuous function f : R — R, and an open set V C R, such 
that the image f(V) := {f(x): «a € V} of V is not open. 

Exercise 1.5.5. Let (R,d) be the real line with the standard metric. Give an 
example of a continuous function f : R — R, and a closed set F C R, such 
that f(F’) is not closed. 


Exercise 1.5.6. Prove Corollary 1.5.9. (Hint: work in the compact metric space 
(Ky, d|k,x«,), and consider the sets V,, := K,\Ky,, which are open on Kj. 
Assume for sake of contradiction that (\"-_, K, = 0, and then apply Theorem 
1.5.8.) 
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Exercise 1.5.7. Prove Theorem 1.5.10. (Hint: for part (c), you may wish to 
use (b), and first prove that every singleton set is compact.) 


Exercise 1.5.8. Let (X, dj) be the metric space from Exercise 1.1.15. For each 


natural number n, let e”) = (yes be the sequence in X such that en =1 


when n = j and av := 0 when n 4 j. Show that the set fe :n € N} is 
a closed and bounded subset of X, but is not compact. (This is despite the 
fact that (X,dj:) is even a complete metric space - a fact which we will not 
prove here. The problem is that not that X is incomplete, but rather that it 
is “infinite-dimensional”, in a sense that we will not discuss here.) 


Exercise 1.5.9. Show that a metric space (X,d) is compact if and only if every 
sequence in X has at least one limit point. 


Exercise 1.5.10. A metric space (X,d) is called totally bounded if for ev- 
ery € > O, there exists a positive integer n and a finite number of balls 
B(«,¢),..., B(a™),e) which cover X (ie., X =U, B(x, €). 


(a) Show that every totally bounded space is bounded. 


(b) Show the following stronger version of Proposition 1.5.5: if (X,d) is 
compact, then complete and totally bounded. (Hint: if X is not totally 
bounded, then there is some ¢ > 0 such that X cannot be covered by 
finitely many ¢-balls. Then use Exercise 8.5.20 to find an infinite se- 
quence of balls B(a™, ¢/2) which are disjoint from each other. Use this 
to then construct a sequence which has no convergent subsequence. ) 


(c) Conversely, show that if X is complete and totally bounded, then X is 
compact. (Hint: if (#"))2°, is a sequence in X, use the total bound- 
edness hypothesis to recursively construct a sequence of subsequences 
(a'"9))°°_, of (x™)°%, for each positive integer j, such that for each j, 
the elements of the sequence (a‘"*J))°°_, are contained in a single ball of 
radius 1/7, and also that each sequence (a'"J+1))°°_, is a subsequence of 
the previous one (x'"J))°°_,. Then show that the “diagonal” sequence 
(a("))°°_| is a Cauchy sequence, and then use the completeness hypoth- 
esis.) 


Exercise 1.5.11. Let (X,d) have the property that every open cover of X has 
a finite subcover. Show that X is compact. (Hint: if X is not compact, then 
by Exercise 1.5.9, there is a sequence (a‘”))°°_, with no limit points. Then for 
every x € X there exists a ball B(x,¢) containing « which contains at most 
finitely many elements of this sequence. Now use the hypothesis.) 


Exercise 1.5.12. Let (X, daisc) be a metric space with the discrete metric dgisc. 
(a) Show that X is always complete. 


(b) When is X compact, and when is X not compact? Prove your claim. 
(Hint: the Heine-Borel theorem will be useless here since that only ap- 
plies to Euclidean spaces.) 


1.5. Compact metric spaces 27 


Exercise 1.5.13. Let E and F' be two compact subsets of R (with the standard 
metric d(x, y) = |x — y|). Show that the Cartesian product EF x F := {(ax,y): 
x € E,y € F} is a compact subset of R? (with the Euclidean metric d)z). 


Exercise 1.5.14. Let (X,d) be a metric space, let E be a non-empty compact 
subset of X, and let xp be a point in X. Show that there exists a point 7 € E 
such that 

d(xo,x) = inf{d(xo, y) : y € EF}, 


ie, x is the closest point in E to xo. (Hint: let R be the quantity 
R := inf{d(ao,y) : y € E}. Construct a sequence (2()°°, in E such that 
d(xo,x')) < R++4, and then use the compactness of E.) 


Exercise 1.5.15. Let (X,d) be a compact metric space. Suppose that (Ka)aer 
is a collection of closed sets in X with the property that any finite subcollection 
of these sets necessarily has non-empty intersection, thus (),¢p Ka # 0) for all 
finite F C I. (This property is known as the finite intersection property.) Show 
that the entire collection has non-empty intersection, thus (.\,-; Ka # 9. Show 
by counterexample that this statement fails if X is not compact. 


Chapter 2 


Continuous functions on metric spaces 


2.1 Continuous functions 


In the previous chapter we studied a single metric space (X,d), and the 
various types of sets one could find in that space. While this is already 
quite a rich subject, the theory of metric spaces becomes even richer, 
and of more importance to analysis, when one considers not just a single 
metric space, but rather pairs (X,dx) and (Y,dy) of metric spaces, as 
well as continuous functions f : X — Y between such spaces. To define 
this concept, we generalize Definition 9.4.1 as follows: 


Definition 2.1.1 (Continuous functions). Let (X,dx) be a metric 
space, and let (Y,dy) be another metric space, and let f : X > Y 
be a function. If 79 € X, we say that f is continuous at xo iff for every 
€é > 0, there exists a 6 > 0 such that dy(f(x), f(ao)) < ¢ whenever 
dx(x,Xo0) < 6. We say that f is continuous iff it is continuous at every 
point 7 Ee X. 


Remark 2.1.2. Continuous functions are also sometimes called con- 
tinuous maps. Mathematically, there is no distinction between the two 
terminologies. 


Remark 2.1.3. If f : X — Y is continuous, and K is any subset of X, 
then the restriction f|xK : K — Y of f to K is also continuous (why?). 


We now generalize much of the discussion in Chapter 9. We first 
observe that continuous functions preserve convergence: 


Theorem 2.1.4 (Continuity preserves convergence). Suppose that 
(X,dx) and (Y,dy) are metric spaces. Let f : X > Y be a function, 
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and let xo € X be a point in X. Then the following three statements are 
logically equivalent: 


(a) f is continuous at xo. 


(b) Whenever (x\)°, is a sequence in X which converges to xo with 
respect to the metric dx, the sequence (f(x\™))°~, converges to 


f(xo) with respect to the metric dy. 


(c) For every open set V CY that contains f (xo), there exists an open 


0); 
set U C X containing xq such that f(UU) CV. 


Proof. See Exercise 2.1.1. 


Another important characterization of continuous functions involves 
open sets. 


Theorem 2.1.5. Let (X,dx) be a metric space, and let (Y,dy) be an- 
other metric space. Let f : X + Y be a function. Then the following 
four statements are equivalent: 


(a) f is continuous. 


(b) Whenever (2), is a sequence in X which converges to some 
point x9 € X with respect to the metric dx, the sequence 


(f(a™))&~, converges to f(ay) with respect to the metric dy. 


(c) Whenever V is an open set in Y, the set f-'(V) := {2 € X : 
f(a) € V} is an open set in X. 


(d) Whenever F is a closed set in Y, the set f-'(F) := {m € X: 
f(x) € F} is a closed set in X. 


Proof. See Exercise 2.1.2. 


Remark 2.1.6. It may seem strange that continuity ensures that the 
inverse image of an open set is open. One may guess instead that the 
reverse should be true, that the forward image of an open set is open; 
but this is not true; see Exercises 1.5.4, 1.5.5. 


As a quick corollary of the above two Theorems we obtain 


Corollary 2.1.7 (Continuity preserved by composition). Let (X,dx), 
(Y,dy), and (Z,dz) be metric spaces. 


(a) If f: X > Y is continuous at a point ro € X, andg: Y > Z is 
continuous at f(xo), then the composition go f : X — Z, defined 
by go f(x) := g(f(x)), ts continuous at xo. 
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(b) If f : X > Y is continuous, and g:Y — Z is continuous, then 
gof:X > Z is also continuous. 


Proof. See Exercise 2.1.3. 


Example 2.1.8. If f : X — R is a continuous function, then the func- 
tion f? : X > R defined by f?(x) := f(x)? is automatically continuous 
also. This is because we have f? = go f, where g : R > R is the 
squaring function g(a) := x7, and g is a continuous function. 


— Exercises — 


Exercise 2.1.1. Prove Theorem 2.1.4. (Hint: review your proof of Proposition 
9.4.7.) 


Exercise 2.1.2. Prove Theorem 2.1.5. (Hint: Theorem 2.1.4 already shows that 
(a) and (b) are equivalent.) 


Exercise 2.1.3. Use Theorem 2.1.4 and Theorem 2.1.5 to prove Corollary 2.1.7. 


Exercise 2.1.4. Give an example of functions f: R > R and g: R- R such 
that 


(a) f is not continuous, but g and go f are continuous; 
(b) g is not continuous, but f and go f are continuous; 
(c) f and g are not continuous, but go f is continuous. 


Explain briefly why these examples do not contradict Corollary 2.1.7. 


Exercise 2.1.5. Let (X,d) be a metric space, and let (£,d|zx¢) be a sub- 
space of (X,d). Let ug_.x : E ~ X be the inclusion map, defined by setting 
tp-+x (x) := & for all « € E. Show that tz_,x is continuous. 


Exercise 2.1.6. Let f : X — Y be a function from one metric space (X, dx) 
to another (Y,dy). Let EF be a subset of X (which we give the induced metric 
dx|gexe), and let f|g : E > Y be the restriction of f to E, thus f| g(x) := f(x) 
when « € E. If ap € EF and f is continuous at 29, show that f|z is also 
continuous at 29. (Is the converse of this statement true? Explain.) Conclude 
that if f is continuous, then f|z is continuous. Thus restriction of the domain 
of a function does not destroy continuity. (Hint: use Exercise 2.1.5.) 


Exercise 2.1.7. Let f : X — Y be a function from one metric space (X, dx) 
to another (Y,dy). Suppose that the image f(X) of X is contained in some 
subset EF C Y of Y. Let g : X — E be the function which is the same as f 
but with the range restricted from Y to E, thus g(x) = f(a) for alla € X. We 
give EF the metric dy|zy~ induced from Y. Show that for any xp € X, that 
f is continuous at xo if and only if g is continuous at x9. Conclude that f is 
continuous if and only if g is continuous. (Thus the notion of continuity is not 
affected if one restricts the range of the function.) 
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2.2 Continuity and product spaces 


Given two functions f : X > Y and g: X — Z, one can define their 
direct sum f @g : X > Y x Z defined by f @ g(x) := (f(x), g(x)), 
i.e., this is the function taking values in the Cartesian product Y x Z 
whose first co-ordinate is f(a) and whose second co-ordinate is g(x) (cf. 
Exercise 3.5.7). For instance, if f : R — Ris the function f(x) := 27+3, 
and g: R > R is the function g(x) = 42, then f @g: R — R? is the 
function f © g(x) := (x? +3,4x). The direct sum operation preserves 
continuity: 


Lemma 2.2.1. Let f: X ~ Randg: X > R be functions, and let 
f®g:X > R? be their direct sum. We give R? the Euclidean metric. 


(a) If xo € X, then f and g are both continuous at xo if and only if 
f ®g is continuous at x. 


(b) f and g are both continuous if and only if f ® g is continuous. 


Proof. See Exercise 2.2.1. 


To use this, we first need another continuity result: 


Lemma 2.2.2. The addition function (x,y) H x+y, the subtrac- 
tion function (x,y) 4 «—y, the multiplication function (x,y) > xy, 
the maximum function (x,y) + max(x,y), and the minimum func- 
tion (a,y) + min(z,y), are all continuous functions from R? to R. 
The division function (x,y)  x/y is a continuous function from 
R x (R\{0}) = {(2,y) € R?: y 4 0} to R. For any real number 
c, the function x +> cx is a continuous function from R to R. 


Proof. See Exercise 2.2.2. 
Combining these lemmas we obtain 


Corollary 2.2.3. Let (X,d) be a metric space, let f : X —+ R and 
g:X >R be functions. Let c be a real number. 


(a) If xo € X and f and g are continuous at xo, then the functions 
ftg:X OR, f-g:X OR, fg: X —R, max(f,g): X ~R, 
min(f,g) : X > R, and cf : X > R (see Definition 9.2.1 for 
definitions) are also continuous at xo. If g(x) #0 for alla € X, 
then f/g: X + R is also continuous at xo. 
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(b) If f and g are continuous, then the functions f +g: X > R, 
f-g:X OR, fg: X >R, max(f,g): X > R, min(f,g):X > 
R, and cf : X > R are also continuous at xo. If g(x) £0 for all 
x eX, then f/g: X > R is also continuous at x0. 


Proof. We first prove (a). Since f and g are continuous at xo, then by 
Lemma 2.2.1 f ®g : X — R? is also continuous at x. On the other 
hand, from Lemma 2.2.2 the function (x,y) + x + y is continuous at 
every point in R?, and in particular is continuous at f ® g(xo). If we 
then compose these two functions using Corollary 2.1.7 we conclude that 
f+g:X — Ris continuous. A similar argument gives the continuity of 
f-—4g, fg, max(f,g), min(f,g) and cf. To prove the claim for f/g, we 
first use Exercise 2.1.7 to restrict the range of g from R to R\{0}, and 
then one can argue as before. The claim (b) follows immediately from 


(a). 


This corollary allows us to demonstrate the continuity of a large class 
of functions; we give some examples below. 


— Exercises — 


Exercise 2.2.1. Prove Lemma 2.2.1. (Hint: use Proposition 1.1.18 and Theorem 
2.1.4.) 


Exercise 2.2.2. Prove Lemma 2.2.2. (Hint: use Theorem 2.1.5 and limit laws 
(Theorem 6.1.19).) 


Exercise 2.2.3. Show that if f : X — R is a continuous function, so is the 
function |f| : X — R defined by |f|(x) := |f(x)|. 


Exercise 2.2.4. Let 7; :R? — Rand 72: R? > R be the functions 7 (2, y) := 
x and m(x,y) := y (these two functions are sometimes called the co-ordinate 
functions on R?). Show that 7, and v2 are continuous. Conclude that if 
f :R- X is any continuous function into a metric space (X,d), then the 
functions g; : R? > X and g. : R? > X defined by gi(x,y) := f(x) and 
g2(x,y) := f(y) are also continuous. 


Exercise 2.2.5. Let n,m > 0 be integers. Suppose that for every 0 <i <n and 
0 <j <m we have a real number c¢;;. Form the function P : R? > R defined 


by 
P(a,y):= S- S- cgay’. 
i=0 j=0 

(Such a function is known as a polynomial of two variables; a typical example 
of such a polynomial is P(x,y) = x? + 2ry? — 2? + 3y4+6.) Show that P 
is continuous. (Hint: use Exercise 2.2.4 and Corollary 2.2.3.) Conclude that 
if f : X — Rand g: X — R are continuous functions, then the function 
P(f,g): X — R defined by P(f,g)(x) := P(f(x), g(x)) is also continuous. 
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Exercise 2.2.6. Let R™ and R” be Euclidean spaces. If f : X — R” and 
g : X + R” are continuous functions, show that f 6g: X > R™*” is also 
continuous, where we have identified R™ x R” with R™*” in the obvious 
manner. Is the converse statement true? 

Exercise 2.2.7. Let k > 1, let I be a finite subset of N*, and let c: I + R be 
a function. Form the function P : R* — R defined by 


Plt DR) = S- CU aye ees 


(Such a function is known as a polynomial of k variables; a typical example of 
such a polynomial is P(x1, 22,73) = 3x}rqx3 — rox2 +21, +5.) Show that P is 
continuous. (Hint: use induction on k, Exercise 2.2.6, and either Exercise 2.2.5 
or Lemma 2.2.2.) 
Exercise 2.2.8. Let (X,dx) and (Y,dy) be metric spaces. Define the metric 
dxxy :(X x Y) x (X x Y) > [0, 00) by the formula 

dxxy((z, y); (ae, y’)) = dx (x, v) = dy (y, y’). 
Show that (X x Y,dx xy) is a metric space, and deduce an analogue of Propo- 
sition 1.1.18 and Lemma 2.2.1. 
Exercise 2.2.9. Let f :R? > R be a function from R? to R. Let (xo, yo) be a 
point in R?. If f is continuous at (xo, yo), show that 


lim lim sup f(z, y) = Je neue f(x,y) = f (£0, yo) 


T+2ZO y—>yo 


and 
lim lim inf f(a, y) = lim lim inf f(x,y) = f(£o, Yo). 
L->XLo YoYo Y>Y¥o L>+Xo 
(Recall that limsup,_,,, f(v) := infrsosupj,—2o)<r f(w) and liminfz+,, 


XL) := Sup, So infjz—2,)<r f(x).) In particular, we have 
r>0 | ol 


lim lim f(#,y) = lim lim f(z,y) 

xL—+>XLO Y> Yo Yr Yo LT LO 
whenever the limits on both sides exist. (Note that the limits do not neces- 
sarily exist in general; consider for instance the function f : R? > R such 
that f(x,y) = ysin+ when zy # 0 and f(z,y) = 0 otherwise.) Discuss the 
comparison between this result and Example 1.2.7. 


Exercise 2.2.10. Let f : R? > R be a continuous function. Show that for 
each x € R, the function y+> f(x,y) is continuous on R, and for each y € R, 
the function «+> f(x,y) is continuous on R. Thus a function f(x,y) which is 
jointly continuous in (2, y) is also continuous in each variable x, y separately. 
Exercise 2.2.11. Let f : R? + R be the function defined by f(a, y) := ier 
when (x,y) 4 (0,0), and f(z, y) = 0 otherwise. Show that for each fixed x € R, 
the function y +> f(x,y) is continuous on R, and that for each fixed y € R, the 
function «+> f(x,y) is continuous on R, but that the function f : R? > R is 
not continuous on R?. This shows that the converse to Exercise 2.2.10 fails; it 
is possible to be continuous in each variable separately without being jointly 
continuous. 
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2.3 Continuity and compactness 


Continuous functions interact well with the concept of compact sets 
defined in Definition 1.5.1. 


Theorem 2.3.1 (Continuous maps preserve compactness). Let f : 
X — Y be a continuous map from one metric space (X,dx) to an- 
other (Y,dy). Let K C X be any compact subset of X. Then the image 
f(K):={f(x): a2 € K} of K is also compact. 


Proof. See Exercise 2.3.1. 


This theorem has an important consequence. Recall from Definition 
9.6.5 the notion of a function f : X — R attaining a maximum or 
minimum at a point. We may generalize Proposition 9.6.7 as follows: 


Proposition 2.3.2 (Maximum principle). Let (X,d) be a compact met- 
ric space, and let f : X — R be a continuous function. Then f is 
bounded. Furthermore, f attains its maximum at some point Imax © X, 
and also attains its minimum at some point Lmin © X. 


Proof. See Exercise 2.3.2. 


Remark 2.3.3. As was already noted in Exercise 9.6.1, this principle 
can fail if X is not compact. This proposition should be compared with 
Lemma 9.6.3 and Proposition 9.6.7. 


Another advantage of continuous functions on compact sets is that 
they are uniformly continuous. We generalize Definition 9.9.2 as follows: 


Definition 2.3.4 (Uniform continuity). Let f : X — Y be a map 
from one metric space (X,dx) to another (Y,dy). We say that f is 
uniformly continuous if, for every ¢ > 0, there exists a 6 > 0 such that 
dy (f(x), f(x')) < € whenever x, x’ € X are such that dx(a,2’) < 6. 


Every uniformly continuous function is continuous, but not con- 
versely (Exercise 2.3.3). But if the domain X is compact, then the 
two notions are equivalent: 


Theorem 2.3.5. Let (X,dx) and (Y,dy) be metric spaces, and suppose 
that (X,dx) is compact. If f : X + Y is function, then f is continuous 
if and only if it is uniformly continuous. 
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Proof. If f is uniformly continuous then it is also continuous by Ex- 
ercise 2.3.3. Now suppose that f is continuous. Fix ¢ > 0. For ev- 
ery 9 € X, the function f is continuous at xo. Thus there exists a 
d(ao) > 0, depending on xo, such that dy(f(zx), f(xo)) < ¢/2 when- 
ever dx(x,2%9) < 6(xo). In particular, by the triangle inequality this 
implies that dy (f(x), f(2’)) < © whenever x € Bix ax)(0,6(X0)/2) and 
dx(x’,x) < 6(a)/2 (why?). 
Now consider the (possibly infinite) collection of balls 


{Bcx,dx) (Xo, 6(20)/2) : to € XF. 


Each ball in this collection is of course open, and the union of all these 
balls covers X, since each point x9 in X is contained in its own ball 
Bcx,dx)(®0,6(x0)/2). Hence, by Theorem 1.5.8, there exist a finite 
number of points 71,...,@, such that the balls Byx,a,)(xj, (xj) /2) for 
j=l,...,n cover X: 


XC U Bcx,dx)(@j, 6(x5)/2). 
j= 


Now let 6 := min#_) 6(xj)/2. Since each of the d(zj;) are positive, 
and there are only a finite number of 7, we see that 6 > 0. Now 
let x,x2’ be any two points in X such that dx(a,2’) < 6. Since 
the balls Bcx,a,)(vj,6(aj)/2) cover X, we see that there must exist 
1 <j <n such that & € Byxa,)(#j,6(a;)/2). Since dx(z,z') < 4, 
we have dx (x, x’) < 6(x;)/2, and so by the previous discussion we have 
dy (f(a), f(a’)) < e. We have thus found a 6 such that dy (f(x), f(2’)) < 
é whenever d(x,x’) < 6, and this proves uniform continuity as de- 
sired. 


— Exercises — 
Exercise 2.3.1. Prove Theorem 2.3.1. 


Exercise 2.3.2. Prove Proposition 2.3.2. (Hint: modify the proof of Proposition 
9.6.7.) 


Exercise 2.3.3. Show that every uniformly continuous function is continuous, 
but give an example that shows that not every continuous function is uniformly 
continuous. 


Exercise 2.3.4. Let (X,dx), (Y,dy), (Z,dz) be metric spaces, and let f : 
X — Y andg: Y > Z be two uniformly continuous functions. Show that 
go f:X — Z is also uniformly continuous. 
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Exercise 2.3.5. Let (X,dx) be a metric space, and let f: X > Randg: X > 
R be uniformly continuous functions. Show that the direct sum f@g : X > R? 
defined by f © g(x) := (f(x), g(x)) is uniformly continuous. 

Exercise 2.3.6. Show that the addition function (#,y) + x+y and the sub- 
traction function (x,y) + x —y are uniformly continuous from R? to R, but 
the multiplication function (x,y) 4 xy is not. Conclude that if f:X > R 
and g: X — R are uniformly continuous functions on a metric space (X,d), 
then f+g:X — Rand f—g: X > Rare also uniformly continuous. Give an 
example to show that fg: X — R need not be uniformly continuous. What is 
the situation for max(f,g), min(f,g), f/g, and cf for a real number c? 


2.4 Continuity and connectedness 


We now describe another important concept in metric spaces, that of 
connectedness. 


Definition 2.4.1 (Connected spaces). Let (X,d) be a metric space. We 
say that X is disconnected iff there exist disjoint non-empty open sets 
V and W in X such that V UW = X. (Equivalently, X is disconnected 
if and only if X contains a non-empty proper subset which is simultane- 
ously closed and open.) We say that X is connected iff it is non-empty 
and not disconnected. 


We declare the empty set @ as being special - it is neither connected 
nor disconnected; one could think of the empty set as “unconnected”. 


Example 2.4.2. Consider the set X := [1,2] U [3,4], with the usual 
metric. This set is disconnected because the sets [1,2] and [3,4] are 
open relative to X (why?). 


Intuitively, a disconnected set is one which can be separated into 
two disjoint open sets; a connected set is one which cannot be separated 
in this manner. We defined what it means for a metric space to be 
connected; we can also define what it means for a set to be connected. 


Definition 2.4.3 (Connected sets). Let (X,d) be a metric space, and 
let Y be a subset of X. We say that Y is connected iff the metric space 
(Y, dlyxy) is connected, and we say that Y is disconnected iff the metric 
space (Y,d|yxy) is disconnected. 


Remark 2.4.4. This definition is intrinsic; whether a set Y is connected 
or not depends only on what the metric is doing on Y, but not on what 
ambient space X one placing Y in. 


2.4. Continuity and connectedness 37 


On the real line, connected sets are easy to describe. 


Theorem 2.4.5. Let X be a subset of the real line R. Then the following 
statements are equivalent. 


(a) X is connected. 


(b) Whenever x,y € X and x < y, the interval |x, y] is also contained 
in X. 


(c) X is an interval (in the sense of Definition 9.1.1). 


Proof. First we show that (a) implies (b). Suppose that X is connected, 
and suppose for sake of contradiction that we could find points « < y 
in X such that [x,y] is not contained in X. Then there exists a real 
number xz < z < y such that z ¢ X. Thus the sets (—oo,z)M X and 
(z,00) N X will cover X. But these sets are non-empty (because they 
contain x and y respectively) and are open relative to X, and so X is 
disconnected, a contradiction. 

Now we show that (b) implies (a). Let X be a set obeying the 
property (b). Suppose for sake of contradiction that X is disconnected. 
Then there exist disjoint non-empty sets V, W which are open relative 
to X, such that V UW = X. Since V and W are non-empty, we may 
choose an x € V and y € W. Since V and W are disjoint, we have 
x # y; without loss of generality we may assume x < y. By property 
(b), we know that the entire interval [x,y] is contained in X. 

Now consider the set [x,y] V. This set is both bounded and non- 
empty (because it contains x). Thus it has a supremum 


z:=sup([z,y] NV). 


Clearly z € [x,y], and hence z € X. Thus either z € V or z € W. 
Suppose first that z € V. Then z ¥ y (since y € W and V is disjoint 
from W). But V is open relative to X, which contains [z, y], so there 
is some ball Byjzyj),a)(Z,7) which is contained in V. But this contradicts 
the fact that z is the supremum of [x,y] 7 V. Now suppose that z € W. 
Then z # x (since x € V and V is disjoint from W). But W is open 
relative to X, which contains [x,y], so there is some ball Big yj\,a)(z,7) 
which is contained in W. But this again contradicts the fact that z is the 
supremum of [x,y] V. Thus in either case we obtain a contradiction, 
which means that X cannot be disconnected, and must therefore be 
connected. 
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It remains to show that (b) and (c) are equivalent; we leave this to 
Exercise 2.4.3. 


Continuous functions map connected sets to connected sets: 


Theorem 2.4.6 (Continuity preserves connectedness). Let f : X > Y 
be a continuous map from one metric space (X,dx) to another (Y, dy). 
Let E be any connected subset of X. Then f(E) is also connected. 


Proof. See Exercise 2.4.4. 


An important corollary of this result is the intermediate value theo- 
rem, generalizing Theorem 9.7.1. 


Corollary 2.4.7 (Intermediate value theorem). Let f : X > R be a 
continuous map from one metric space (X,dx) to the real line. Let E 
be any connected subset of X, and let a,b be any two elements of E. Let 
y be a real number between f(a) and f(b), i.e., either f(a) < y < f(b) 
or f(a) > y => f(b). Then there exists c€ E such that f(c) = y. 


Proof. See Exercise 2.4.5. 


— Exercises — 


Exercise 2.4.1. Let (X,daisc) be a metric space with the discrete metric. Let 
E be a subset of X which contains at least two elements. Show that EF is 
disconnected. 


Exercise 2.4.2. Let f : X — Y be a function from a connected metric space 
(X,d) to a metric space (Y,daisc) with the discrete metric. Show that f is 
continuous if and only if it is constant. (Hint: use Exercise 2.4.1.) 


Exercise 2.4.3. Prove the equivalence of statements (b) and (c) in Theorem 
2.4.5. 


Exercise 2.4.4. Prove Theorem 2.4.6. (Hint: the formulation of continuity in 
Theorem 2.1.5(c) is the most convenient to use.) 


Exercise 2.4.5. Use Theorem 2.4.6 to prove Corollary 2.4.7. 


Exercise 2.4.6. Let (X,d) be a metric space, and let (Fa)acr be a collection 
of connected sets in X. Suppose also that (),<; Ha is non-empty. Show that 
User Fa is connected. 


Exercise 2.4.7. Let (X,d) be a metric space, and let EF be a subset of X. We 
say that E is path-connected iff, for every x,y € E, there exists a continuous 
function y : [0,1] > £ from the unit interval [0, 1] to E such that 7(0) = x and 
(1) = y. Show that every path-connected set is connected. (The converse is 
false, but is a bit tricky to show and will not be detailed here.) 
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Exercise 2.4.8. Let (X,d) be a metric space, and let FE be a subset of X. Show 
that if E is connected, then the closure E of E is also connected. Is the converse 
true? 

Exercise 2.4.9. Let (X,d) be a metric space. Let us define a relation x ~ y on 
X by declaring x ~ y iff there exists a connected subset of X which contains 
both z and y. Show that this is an equivalence relation (i.e., it obeys the 
reflexive, symmetric, and transitive axioms). Also, show that the equivalence 
classes of this relation (i.e., the sets of the form {y € X : y ~ x} for some 
x € X) are all closed and connected. (Hint: use Exercise 2.4.6 and Exercise 
2.4.8.) These sets are known as the connected components of X. 

Exercise 2.4.10. Combine Proposition 2.3.2 and Corollary 2.4.7 to deduce a 
theorem for continuous functions on a compact connected domain which gen- 
eralizes Corollary 9.7.4. 


2.5 Topological spaces (Optional) 


The concept of a metric space can be generalized to that of a topological 
space. The idea here is not to view the metric d as the fundamental 
object; indeed, in a general topological space there is no metric at all. 
Instead, it is the collection of open sets which is the fundamental concept. 
Thus, whereas in a metric space one introduces the metric d first, and 
then uses the metric to define first the concept of an open ball and then 
the concept of an open set, in a topological space one starts just with 
the notion of an open set. As it turns out, starting from the open sets, 
one cannot necessarily reconstruct a usable notion of a ball or metric 
(thus not all topological spaces will be metric spaces), but remarkably 
one can still define many of the concepts in the preceding sections. 

We will not use topological spaces at all in this text, and so we shall 
be rather brief in our treatment of them here. A more complete study 
of these spaces can of course be found in any topology textbook, or a 
more advanced analysis text. 


Definition 2.5.1 (Topological spaces). A topological space is a pair 
(X,F), where X is a set, and F C 2* is a collection of subsets of X, 
whose elements are referred to as open sets. Furthermore, the collection 
F must obey the following properties: 


e The empty set ( and the whole set X are open; in other words, 
PeFand xX ef. 


e Any finite intersection of open sets is open. In other words, if 
Vi,...,Vn are elements of F, then Vj N...9 Vy, is also in F. 
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e Any arbitrary union of open sets is open (including infinite unions). 
In other words, if (Va)aer is a family of sets in F, then Ue; Va is 
also in F. 


In many cases, the collection ¥ of open sets can be deduced from context, 
and we shall refer to the topological space (X,F) simply as X. 


From Proposition 1.2.15 we see that every metric space (X,d) is au- 
tomatically also a topological space (if we set F equal to the collection 
of sets which are open in (X,d)). However, there do exist topologi- 
cal spaces which do not arise from metric spaces (see Exercise 2.5.1, 
2.5.6). 

We now develop the analogues of various notions in this chapter and 
the previous chapter for topological spaces. The notion of a ball must 
be replaced by the notion of a neighbourhood. 


Definition 2.5.2 (Neighbourhoods). Let (X, 7) be a topological space, 
and let  € X. A neighbourhood of x is defined to be any open set in F 
which contains x. 


Example 2.5.3. If (X,d) is a metric space, x € X, and r > 0, then 
B(a,r) is a neighbourhood of . 


Definition 2.5.4 (Topological convergence). Let m be an integer, 
(X,F) be a topological space and let (a), be a sequence of points 
in X. Let x be a point in X. We say that (2))°° converges to x if 
and only if, for every neighbourhood V of x, there exists an N > m such 


that 2™ € V for alln >N. 


This notion is consistent with that of convergence in metric spaces 
(Exercise 2.5.2). One can then ask whether one has the basic property 
of uniqueness of limits (Proposition 1.1.20). The answer turns out to 
usually be yes - if the topological space has an additional property known 
as the Hausdorff property - but the answer can be no for other topologies; 
see Exercise 2.5.4. 


Definition 2.5.5 (Interior, exterior, boundary). Let (X,F) be a topo- 
logical space, let E’ be a subset of X, and let xp be a point in X. We 
say that xo is an interior point of E if there exists a neighbourhood V 
of xq such that V C E. We say that xo is an exterior point of E if there 
exists a neighbourhood V of zo such that VM E = 0. We say that 2x9 
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is a boundary point of E if it is neither an interior point nor an exterior 
point of E. 


This definition is consistent with the corresponding notion for metric 
spaces (Exercise 2.5.3). 


Definition 2.5.6 (Closure). Let (X,F) be a topological space, let E be 
a subset of X, and let x9 be a point in X. We say that xo is an adherent 
point of E if every neighbourhood V of xo has a non-empty intersection 
with EF. The set of all adherent points of F is called the closure of E 
and is denoted E. 


There is a partial analogue of Theorem 1.2.10, see Exercise 2.5.10. 

We define a set AK in a topological space (X,F) to be closed iff 
its complement X\K is open; this is consistent with the metric space 
definition, thanks to Proposition 1.2.15(e). Some partial analogues of 
that Proposition are true (see Exercise 2.5.11). 

To define the notion of a relative topology, we cannot use Definition 
1.3.3 as this requires a metric function. However, we can instead use 
Proposition 1.3.4 as our starting point: 


Definition 2.5.7 (Relative topology). Let (X,F) be a topological 
space, and Y be a subset of X. Then we define Fy := {VNY :V € F}, 
and refer this as the topology on Y induced by (X,¥F). We call (Y, Fy) 
a topological subspace of (X,F). This is indeed a topological space, see 
Exercise 2.5.12. 


From Proposition 1.3.4 we see that this notion is compatible with 
the one for metric spaces. 
Next we define the notion of continuity. 


Definition 2.5.8 (Continuous functions). Let (X,Fx) and (Y, Fy) be 
topological spaces, and let f : X — Y bea function. If x9 € X, we say 
that f is continuous at xo iff for every neighbourhood V of f(aq), there 
exists a neighbourhood U of xo such that f(U) C V. We say that f is 
continuous iff it is continuous at every point 7 ¢ X. 


This definition is consistent with that in Definition 2.1.1 (Exercise 
2.5.15). Partial analogues of Theorems 2.1.4 and 2.1.5 are available 
(Exercise 2.5.16). In particular, a function is continuous iff the pre- 
images of every open set is open. 

There is unfortunately no notion of a Cauchy sequence, a complete 
space, or a bounded space, for topological spaces. However, there is 
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certainly a notion of a compact space, as we can see by taking Theorem 
1.5.8 as our starting point: 


Definition 2.5.9 (Compact topological spaces). Let (X,#F) be a topo- 
logical space. We say that this space is compact if every open cover of 
X has a finite subcover. If Y is a subset of X, we say that Y is compact 
if the topological space on Y induced by (X,F) is compact. 


Many basic facts about compact metric spaces continue to hold true 
for compact topological spaces, notably Theorem 2.3.1 and Proposition 
2.3.2 (Exercise 2.5.17). However, there is no notion of uniform continu- 
ity, and so there is no analogue of Theorem 2.3.5. 

We can also define the notion of connectedness by repeating Defini- 
tion 2.4.1 verbatim, and also repeating Definition 2.4.3 (but with Defini- 
tion 2.5.7 instead of Definition 1.3.3). Many of the results and exercises 
in Section 2.4 continue to hold for topological spaces (with almost no 
changes to any of the proofs!). 


— Exercises — 


Exercise 2.5.1. Let X be an arbitrary set, and let F := {0,X}. Show that 
(X,F) is a topology (called the trivial topology on X). If X contains more 
than one element, show that the trivial topology cannot be obtained from by 
placing a metric d on X. Show that this topological space is both compact and 
connected. 


Exercise 2.5.2. Let (X,d) be a metric space (and hence a topological space). 
Show that the two notions of convergence of sequences in Definition 1.1.14 and 
Definition 2.5.4 coincide. 


Exercise 2.5.3. Let (X,d) be a metric space (and hence a topological space). 
Show that the two notions of interior, exterior, and boundary in Definition 
1.2.5 and Definition 2.5.5 coincide. 


Exercise 2.5.4. A topological space (X,F) is said to be Hausdorff if given 
any two distinct points x,y € X, there exists a neighbourhood V of « and a 
neighbourhood W of y such that VM W = 9. Show that any topological space 
coming from a metric space is Hausdorff, and show that the trivial topology is 
not Hausdorff. Show that the analogue of Proposition 1.1.20 holds for Hausdorff 
topological spaces, but give an example of a non-Hausdorff topological space in 
which Proposition 1.1.20 fails. (In practice, most topological spaces one works 
with are Hausdorff; non-Hausdorff topological spaces tend to be so pathological 
that it is not very profitable to work with them.) 


Exercise 2.5.5. Given any totally ordered set X with order relation <, declare a 
set V C X to be open if for every x € V there exists a set J which is an interval 
{yEX:a<y <b} for some a,b € X, aray {ye X:a< y} for some ac X, 
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the ray {y © X : y < b} for some b € X, or the whole space X, which contains 
x and is contained in V. Let F be the set of all open subsets of X. Show 
that (X,F) is a topology (this is the order topology on the totally ordered set 
(X,<)) which is Hausdorff in the sense of Exercise 2.5.4. Show that on the real 
line R (with the standard ordering <), the order topology matches the standard 
topology (i.e., the topology arising from the standard metric). If instead one 
applies this to the extended real line R*, show that R is an open set with 
boundary {—oo, +00}. If (a»,)?2 1 is a sequence of numbers in R (and hence in 
R*), show that x, converges to +00 if and only if liminf,-,.. tp = +00, and 
Lp, converges to —oo if and only if limsup,,_,,, Yn = —00. 


Exercise 2.5.6. Let X be an uncountable set, and let F be the collection of all 
subsets & in X which are either empty or co-finite (which means that X\£E is 
finite). Show that (X,F) is a topology (this is called the cofinite topology on 
X) which is not Hausdorff in the sense of Exercise 2.5.4, and is compact and 
connected. Also, show that if « € X (V,)°2, is any countable collection of 
open sets containing x, then (\>—_, V, 4 {x}. Use this to show that the cofinite 
topology cannot be obtained by placing a metric d on X. (Hint: what is the 
set (\7-_, B(x,1/n) equal to in a metric space?) 


Exercise 2.5.7. Let X be an uncountable set, and let F be the collection of 
all subsets EF’ in X which are either empty or co-countable (which means that 
X\E is at most countable). Show that (X,F) is a topology (this is called 
the cocountable topology on X) which is not Hausdorff in the sense of Exercise 
2.5.4, and connected, but cannot arise from a metric space and is not compact. 


Exercise 2.5.8. Show that there exists an uncountable well-ordered set w, + 1 
that has a maximal element oo, and such that the initial segments {x € w, +1: 
x < y} are countable for all y € w;+1\{oo}. (Hint: Well-order the real numbers 
using Exercise 8.5.19, take the union of all the countable initial segments, and 
then adjoin a maximal element oo.) If we give w; + 1 the order topology 
(Exercise 2.5.5), show that w; + 1 is compact; however, show that not every 
sequence has a convergent subsequence. 


Exercise 2.5.9. Let (X,#) be a compact topological space. Assume that this 
space is first countable, which means that for every x € X there exists a count- 
able collection V,,V2,... of neighbourhoods of x, such that every neighbour- 
hood of « contains one of the V,,. Show that every sequence in X has a con- 
vergent subsequence, by modifying Exercise 1.5.11. Explain why this does not 
contradict Exercise 2.5.8. 


Exercise 2.5.10. Prove the following partial analogue of Proposition 1.2.10 for 
topological spaces: (c) implies both (a) and (b), which are equivalent to each 
other. Show that in the co-countable topology in Exercise 2.5.7, it is possible 
for (a) and (b) to hold without (c) holding. 


Exercise 2.5.11. Let E be a subset of a topological space (X,F). Show that E 
is open if and only if every element of FE is an interior point, and show that E 
is closed if and only if E contains all of its adherent points. Prove analogues 
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of Proposition 1.2.15(e)-(h) (some of these are automatic by definition). If 
we assume in addition that X is Hausdorff, prove an analogue of Proposition 
1.2.15(d) also, but give an example to show that (d) can fail when X is not 
Hausdorff. 


Exercise 2.5.12. Show that the pair (Y, Fy) defined in Definition 2.5.7 is indeed 
a topological space. 


Exercise 2.5.13. Generalize Corollary 1.5.9 to compact sets in a topological 
space. 


Exercise 2.5.14. Generalize Theorem 1.5.10 to compact sets in a topological 
space. 


Exercise 2.5.15. Let (X,dx) and (Y,dy) be metric spaces (and hence a topo- 
logical space). Show that the two notions continuity (both at a point, and on 
the whole domain) of a function f : X > Y in Definition 2.1.1 and Definition 
2.5.8 coincide. 


Exercise 2.5.16. Show that when Theorem 2.1.4 is extended to topological 
spaces, that (a) implies (b). (The converse is false, but constructing an example 
is difficult.) Show that when Theorem 2.1.5 is extended to topological spaces, 
that (a), (c), (d) are all equivalent to each other, and imply (b). (Again, the 
converse implications are false, but difficult to prove.) 


Exercise 2.5.17. Generalize both Theorem 2.3.1 and Proposition 2.3.2 to com- 
pact sets in a topological space. 


Chapter 3 


Uniform convergence 


In the previous two chapters we have seen what it means for a sequence 
(a\”))°°_, of points in a metric space (X,dx) to converge to a limit 2; it 
means that limp3o0 dx (a, x) = 0, or equivalently that for every ¢ > 0 
there exists an N > 0 such that dx(«\™, x) < ¢ for alln > N. (We have 
also generalized the notion of convergence to topological spaces (X,F), 
but in this chapter we will focus on metric spaces.) 

In this chapter, we consider what it means for a sequence of functions 
(f CD) from one metric space (X,dx) to another (Y, dy) to converge. 
In other words, we have a sequence of functions f), f(),..., with each 
function f(™ : X > Y being a function from X to Y, and we ask what 
it means for this sequence of functions to converge to some limiting 
function f. 

It turns out that there are several different concepts of convergence 
of functions; here we describe the two most important ones, pointwise 
convergence and uniform convergence. (There are other types of conver- 
gence for functions, such as L' convergence, L? convergence, convergence 
in measure, almost everywhere convergence, and so forth, but these are 
beyond the scope of this text.) The two notions are related, but not 
identical; the relationship between the two is somewhat analogous to 
the relationship between continuity and uniform continuity. 

Once we work out what convergence means for functions, and thus 
can make sense of such statements as limp... f(”) = f, we will then ask 
how these limits interact with other concepts. For instance, we already 
have a notion of limiting values of functions: limy_.z).rex f(x). Can we 
interchange limits, i.e. 


lim lim f™(z)= lim lim f™(2)? 


NOOO £>29;LEX tL x9;LEX N00 
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As we shall see, the answer depends on what type of convergence we have 
for f(”). We will also address similar questions involving interchanging 
limits and integrals, or limits and sums, or sums and integrals. 


3.1 Limiting values of functions 


Before we talk about limits of sequences of functions, we should first 
discuss a similar, but distinct, notion, that of limiting values of functions. 
We shall focus on the situation for metric spaces, but there are similar 
notions for topological spaces (Exercise 3.1.3). 


Definition 3.1.1 (Limiting value of a function). Let (X,dx) and 
(Y,dy) be metric spaces, let E’ be a subset of X, and let f: X ~ Y 
be a function. If a € X is an adherent point of EF, and L € Y, we 
say that f(x) converges to L in Y as x converges to xq in E, or write 
limy-y29.2en f(x) = L, if for every « > 0 there exists a 6 > 0 such that 
dy (f(a), L) < « for all x € E such that dx (x, 20) < 6. 


Remark 3.1.2. Some authors exclude the case x = x from the above 
definition, thus requiring 0 < dx(#,xo9) < 6. In our current notation, 
this would correspond to removing xo from FE, thus one would consider 
lim +29;2€B\{xo} f () instead of limgerg:zce f(x). See Exercise 3.1.1 for 
a comparison of the two concepts. 


Comparing this with Definition 2.1.1, we see that f is continuous at 
xo if and only if 


lim. f(#) = (ao). 


L>x0;HEX 


Thus f is continuous on X iff we have 


lim _ f(x) = f(xo) for all vp € X. 


L>x9;HEX 


Example 3.1.3. If f : R > R is the function f(x) = x? — 4, then 


lim f(z) = f(1) =1-4=-3 


rl 
since f is continuous. 


Remark 3.1.4. Often we shall omit the condition « € X, and abbrevi- 
ate limz-y29:2ex f(x) as simply lim;_,,, f(z) when it is clear what space 
x will range in. 
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One can rephrase Definition 3.1.1 in terms of sequences: 


Proposition 3.1.5. Let (X,dx) and (Y,dy) be metric spaces, let E be 
a subset of X, and let f : X — Y be a function. Let x9 € X be an 
adherent point of E and L€ Y. Then the following four statements are 
logically equivalent: 


(a) limy-+29;0€E f (2) =L. 


(b) For every sequence (x\)%, in E which converges to x with re- 
spect to the metric dx, the sequence (fares converges to L 
with respect to the metric dy. 


(c) For every open set V C Y which contains L, there exists an open 
set U C X containing xo such that f(UN E) CV. 


(d) If one defines the function g : EU{xo} > Y by defining g(xo) := L, 
and g(x) := f(a) for x € E\{xo}, then g is continuous at xo. 
Furthermore, if xo € E, then f(xo) = L. 


Proof. See Exercise 3.1.2. 


Remark 3.1.6. Observe from Proposition 3.1.5(b) and Proposition 
1.1.20 that a function f(x) can converge to at most one limit L as x 
converges to xg. In other words, if the limit 


ii 


exists at all, then it can only take at most one value. 


Remark 3.1.7. The requirement that x9 be an adherent point of E 
is necessary for the concept of limiting value to be useful, otherwise xo 
will lie in the exterior of E, the notion that f(a) converges to L as x 
converges to 29 in E is vacuous (for 6 sufficiently small, there are no 
points x € E so that d(x, 20) < 4). 


Remark 3.1.8. Strictly speaking, we should write 


dy— li instead of li 

Ps et age Pca On 5 ee: 

since the convergence depends on the metric dy. However in practice it 
will be obvious what the metric dy is and so we will omit the dy— prefix 
from the notation. 
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— Exercises — 


Exercise 3.1.1. Let (X,dx) and (Y,dy) be metric spaces, let EF’ be a subset of 
X, let f: E- Y be a function, and let xp be an element of FE. Show that 
the limit lim, .2,:cce f(#) exists if and only if the limit lim,_,.,.2¢H\ {xo} f(£) 
exists and is equal to f(zo). Also, show that if the limit lim,_,,:cex f() 
exists at all, then it must equal f (zo). 


Exercise 3.1.2. Prove Proposition 3.1.5. (Hint: review your proof of Theorem 
2.1.4.) 


Exercise 3.1.3. Use Proposition 3.1.5(c) to define a notion of a limiting value of 
a function f : X — Y from one topological space (X, Fx) to another (Y, Fy). 
Then prove the equivalence of Proposition 3.1.5(c) and 3.1.5(d). If in addition 
Y is a Hausdorff topological space (see Exercise 2.5.4), prove an analogue of 
Remark 3.1.6. Is the same statement true if Y is not Hausdorff? 

Exercise 3.1.4. Recall from Exercise 2.5.5 that the extended real line R* comes 
with a standard topology (the order topology). We view the natural numbers 
N as a subspace of this topological space, and +-oo as an adherent point of N 
in R*. Let (a,)°, be a sequence taking values in a topological space (Y, Fy), 
and let L € Y. Show that limy_,+.0:neN Gn = L (in the sense of Exercise 3.1.3) 
if and only if lim, +. @, = L (in the sense of Definition 2.5.4). This shows that 
the notions of limiting values of a sequence, and limiting values of a function, 
are compatible. 

Exercise 3.1.5. Let (X,dx), (Y,dy), (Z,dz) be metric spaces, and let 7p € X, 
yo€ Y,2€ Z. Let f: X — Y andg: Y — Z be functions, and let E bea 
set. If we have limz-+25:2¢8 f(©) = yo and limy_,,,:ye f(z) 9(y) = 2, conclude 
that limgexy.reen go f(x) = 20. 

Exercise 3.1.6. State and prove an analogue of the limit laws in Proposition 
9.3.14 when X is now a metric space rather than a subset of R. (Hint: use 
Corollary 2.2.3.) 


3.2 Pointwise and uniform convergence 


The most obvious notion of convergence of functions is pointwise con- 
vergence, or convergence at each point of the domain: 


Definition 3.2.1 (Pointwise convergence). Let (f‘"))°2, be a sequence 
of functions from one metric space (X,dx) to another (Y,dy), and let 
f : X — Y be another function. We say that (f()2, converges 
pointwise to f on X if we have 


lim f)(x) = f(z) 


n—->oco 


for all x € X, ice. 
lim dy (f(a), f(x)) = 0. 


nN—->oo 
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Or in other words, for every x and every € > 0 there exists N > 0 such 
that dy(f™ (ax), f(a)) < ¢ for every n > N. We call the function f the 
pointwise limit of the functions f™. 


Remark 3.2.2. Note that f™ (x) and f(x) are points in Y, rather than 
functions, so we are using our prior notion of convergence in metric 
spaces to determine convergence of functions. Also note that we are 
not really using the fact that (X,dx) is a metric space (i.e., we are not 
using the metric dx); for this definition it would suffice for X to just 
be a plain old set with no metric structure. However, later on we shall 
want to restrict our attention to continuous functions from X to Y, 
and in order to do so we need a metric on X (and on Y), or at least a 
topological structure. Also when we introduce the concept of uniform 
convergence, then we will definitely need a metric structure on X and 
Y; there is no comparable notion for topological spaces. 


Example 3.2.3. Consider the functions f : R — R defined by 
f(a) := x/n, while f : R > R is the zero function f(x) := 0. Then 
f™ converges pointwise to f, since for each fixed real number x we have 
lim,—o6 f(z) = limp 456 2/n = 0 = f(a). 


From Proposition 1.1.20 we see that a sequence (f”))°&, of functions 
from one metric space (X,dx) to another (Y,dy) can have at most one 
pointwise limit f (this explains why we can refer to f as the pointwise 
limit). However, it is of course possible for a sequence of functions 
to have no pointwise limit (can you think of an example?), just as a 
sequence of points in a metric space do not necessarily have a limit. 

Pointwise convergence is a very natural concept, but it has a number 
of disadvantages: it does not preserve continuity, derivatives, limits, or 
integrals, as the following three examples show. 


Example 3.2.4. Consider the functions f(™ : [0,1] > R defined by 
f(a) := x", and let f : [0,1] > R be the function defined by setting 
f(x) := 1 when x = land f(x) := 0 when 0 < x < 1. Then the functions 
f™ are continuous, and converge pointwise to f on [0,1] (why? treat 
the cases x = 1 and 0 < x < 1 separately), however the limiting function 
f is not continuous. Note that the same example shows that pointwise 
convergence does not preserve differentiability either. 


Example 3.2.5. If limg—y29:reR f(x) = L for every n, and f™ 
converges pointwise to f, we cannot always take limits conclude that 
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limy-+29.2en f(z) = L. The previous example is also a counterexam- 
ple here: observe that lim,_,1.2¢j0,1) v7" = 1 for every n, but x” con- 
verges pointwise to the function f defined in the previous paragraph, 
and lim, +1;r¢{0,1) f(x) = 0. In particular, we see that 


picamaex  agmaeeanes 8) 
(cf. Example 1.2.8). Thus pointwise convergence does not preserve 
limits. 


Example 3.2.6. Suppose that f() : [a,b] > Ra sequence of Riemann- 
integrable functions on the interval [a,b]. If dis b| f™ = L for every 


n, and f(”) converges pointwise to some new function f, this does not 
mean that Sia b) f =L. An example comes by setting [a, b] := [0,1], and 
letting f(™ be the function f(a) := 2n when a € [1/2n,1/n], and 
f™ (x) := 0 for all other values of x. Then f (") converges pointwise to 
the zero function f(x) := 0 (why?). On the other hand, So 7 Ms 


for every n, while So J f = 0. In particular, we have an example where 


lim {MOF lim f™, 


Noo [a,b] [a,b] n—-oo 


One may think that this counterexample has something to do with the 
f™ being discontinuous, but one can easily modify this counterexample 
to make the f‘”) continuous (can you see how?). 

Another example in the same spirit is the “moving bump” example. 
Let f(™ : R > R be the function defined by f( (ax) := 1 if x € 
[n,n + 1] and f(x) := 0 otherwise. Then fp f™ = 1 for every n 
(where te f is defined as the limit of Ie N,N] fas N goes to infinity). On 


the other hand, f‘” converges pointwise to the zero function 0 (why?), 
and JRO = (0. In both of these examples, functions of area 1 have 
somehow “disappeared” to produce functions of area 0 in the limit. See 
also Example 1.2.9. 


These examples show that pointwise convergence is too weak a con- 
cept to be of much use. The problem is that while f () (a) converges to 
f(x) for each x, the rate of that convergence varies substantially with 
x. For instance, consider the first example where f(”) : (0, 1] — R was 
the function f((x) := x”, and f : [0,1] > R was the function such 
that f(x) := 1 when x = 1, and f(x) := 0 otherwise. Then for each 


3.2. Pointwise and uniform convergence ol 


az, f((x) converges to f(a) as n > oo; this is the same as saying that 
limpsoo 2” = 0 when 0 < x < 1, and that limn5.2” = 1 when x = 1. 
But the convergence is much slower near 1 than far away from 1. For 
instance, consider the statement that limy +... 2” = 0 for allO <a <1. 
This means, for every 0 < x < 1, that for every ¢, there exists an N > 1 
such that |x”| < e¢ for all mn > N - or in other words, the sequence 
1,2,27,2°,... will eventually get less than ¢, after passing some finite 
number N of elements in this sequence. But the number of elements 
N one needs to go out to depends very much on the location of x. For 
instance, take ¢ := 0.1. If = 0.1, then we have |x| < « for all n > 2 - 
the sequence gets underneath ¢ after the second element. But if x = 0.5, 
then we only get |x”| < € for n > 4 - you have to wait until the fourth 
element to get within ¢ of the limit. And if z = 0.9, then one only has 
|z”| <€ when n > 22. Clearly, the closer x gets to 1, the longer one has 
to wait until f(x) will get within ¢ of f(x), although it still will get 
there eventually. (Curiously, however, while the convergence gets worse 
and worse as x approaches 1, the convergence suddenly becomes perfect 
when xz = 1.) 

To put things another way, the convergence of f‘”) to f is not uniform 
in a - the N that one needs to get f(”)(2) within ¢ of f depends on x 
as well as on ¢. This motivates a stronger notion of convergence. 


Definition 3.2.7 (Uniform convergence). Let (f(), be a sequence 
of functions from one metric space (X,dx) to another (Y,dy), and let 
f : X — Y be another function. We say that (f()2, converges 
uniformly to f on X if for every ¢ > 0 there exists N > 0 such that 
dy (f(a), f(x)) < € for every n > N and x € X. We call the function 
f the uniform limit of the functions f™. 


Remark 3.2.8. Note that this definition is subtly different from the 
definition for pointwise convergence in Definition 3.2.1. In the definition 
of pointwise convergence, N was allowed to depend on 2; now it is not. 
The reader should compare this distinction to the distinction between 
continuity and uniform continuity (i.e., between Definition 2.1.1 and 
Definition 2.3.4). A more precise formulation of this analogy is given in 
Exercise 3.2.1. 


It is easy to see that if f‘ converges uniformly to f on X, then 
it also converges pointwise to the same function f (see Exercise 3.2.2); 
thus when the uniform limit and pointwise limit both exist, then they 
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have to be equal. However, the converse is not true; for instance the 
functions f : [0,1] > R defined earlier by f(x) := x” converge 
pointwise, but do not converge uniformly (see Exercise 3.2.2). 


Example 3.2.9. Let f : [0,1] > R be the functions f(”)(2) := x/n, 
and let f : [0,1] + R be the zero function f(x) := 0. Then it is clear that 
f™ converges to f pointwise. Now we show that in fact f”) converges 
to f uniformly. We have to show that for every ¢ > 0, there exists an 
N such that |f(™(a) — f(x)| < ¢ for every x € [0,1] and every n > N. 
To show this, let us fix an ¢ > 0. Then for any zx € [0,1] and n > N, we 
have 


f(x) — F@)| = |2/n - 0| = 2/n <1/n <1/N. 


Thus if we choose N such that N > 1/e (note that this choice of N 
does not depend on what « is), then we have | f(x) — f(a)| < ¢ for all 
n> WN and « € [0,1], as desired. 


We make one trivial remark here: if a sequence f™ : X > Y of 
functions converges pointwise (or uniformly) to a function f : X > Y, 
then the restrictions f|z~:E— Y of f to some subset E of X will 
also converge pointwise (or uniformly) to fly. (Why?) 


— Exercises — 


Exercise 3.2.1. The purpose of this exercise is to demonstrate a concrete rela- 
tionship between continuity and pointwise convergence, and between uniform 
continuity and uniform convergence. Let f : R — R be a function. For any 
a€R, let fa: RR be the shifted function fa(x) := f(a — a). 


(a) Show that f is continuous if and only if, whenever (a7, )°29 is a sequence of 
real numbers which converges to zero, the shifted functions f,,, converge 
pointwise to f. 


(b) Show that f is uniformly continuous if and only if, whenever (a@,,)°29 is 
a sequence of real numbers which converges to zero, the shifted functions 
fa, Converge uniformly to f. 


Exercise 3.2.2. (a) Let (f'))22, be a sequence of functions from one metric 
space (X,dx) to another (Y, dy), and let f : X — Y be another function 
from X to Y. Show that if f(”) converges uniformly to f, then f also 
converges pointwise to f. 


(b) For each integer n > 1, let f/™ : (—1,1) > R be the function f() (2) := 
az”. Prove that f‘”) converges pointwise to the zero function 0, but does 
not converge uniformly to any function f : (—1,1) > R. 
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(c) Let g: (—1,1) > R be the function g(x) := #/(1—2). With the notation 
as in (b), show that the partial sums yy f™ converges pointwise as 
N — o to g, but does not converge uniformly to g, on the open interval 
(—1,1). (Hint: use Lemma 7.3.3.) What would happen if we replaced 
the open interval (—1,1) with the closed interval [—1, 1]? 


Exercise 3.2.3. Let (X,dx) a metric space, and for every integer n > 1, let 
fn : X — R be a real-valued function. Suppose that f, converges pointwise 
to another function f : X¥ > R on X (in this question we give R the standard 
metric d(x, y) = |x — y|). Let h: R > R be a continuous function. Show that 
the functions ho f, converge pointwise to ho f on X, where ho f,: X ~ R 
is the function ho f,(a) := h(fn(x)), and similarly for ho f. 


Exercise 3.2.4. Let fr, : X — Y be a sequence of bounded functions from 
one metric space (X,dx) to another metric space (Y,dy). Suppose that f, 
converges uniformly to another function f : X — Y. Suppose that f is a 
bounded function; i.e., there exists a ball Bry,ay)(yo, R) in Y such that f(x) € 
Bvy,ay) (Yo, R) for all x €¢ X. Show that the sequence f,, is uniformly bounded; 
ie. there exists a ball Byya,)(yo, R) in Y such that fr(x) € Bry.ay)(yo, R) for 
all x € X and all positive integers n. 


3.3 Uniform convergence and continuity 


We now give the first demonstration that uniform convergence is signifi- 
cantly better than pointwise convergence. Specifically, we show that the 
uniform limit of continuous functions is continuous. 


Theorem 3.3.1 (Uniform limits preserve continuity I). Suppose 
(fo) oe is a sequence of functions from one metric space (X,dx) to 
another (Y,dy), and suppose that this sequence converges uniformly to 
another function f : X > Y. Let xo be a point in X. If the functions 
f™ are continuous at x9 for each n, then the limiting function f is also 


continuous at xo. 


Proof. See Exercise 3.3.1. 


This has an immediate corollary: 


Corollary 3.3.2 (Uniform limits preserve continuity II). Let (f()%, 
be a sequence of functions from one metric space (X,dx) to another 
(Y,dy), and suppose that this sequence converges uniformly to another 
function f :X 3 Y. If the functions f™ are continuous on X for each 
n, then the limiting function f is also continuous on X. 
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This should be contrasted with Example 3.2.4. There is a slight 
variant of Theorem 3.3.1 which is also useful: 


Proposition 3.3.3 (Interchange of limits and uniform limits). Let 
(X,dx) and (Y,dy) be metric spaces, with Y complete, and let E’ be 
a subset of X. Let (FO) 25 be a sequence of functions from E to Y, 
and suppose that this sequence converges uniformly in E to some func- 
tion f: E> Y. Let x9 € X be an adherent point of E, and sup- 
pose that for each n the limit limy-42:2eR <M (x) exists. Then the limit 
limy-y29.2ek f(x) also exists, and is equal to the limit of the sequence 
(limts 509-068 Pa) in other words we have the interchange of lim- 
its 
lim lim f™(c)= lim — lim f(a). 


N>OO £2>29;2E EB xr x9;cE EH N—-00 


Proof. See Exercise 3.3.2. 


This should be constrasted with Example 3.2.5. Finally, we have a 
version of these theorems for sequences: 


Proposition 3.3.4. Let (f‘")°~, be a sequence of continuous functions 
from one metric space (X,dx) to another (Y,dy), and suppose that this 
sequence converges uniformly to another function f :X > Y. Let «™ 


be a sequence of points in X which converge to some limit x. Then 
f(ae™) converges (in Y) to f(x). 


Proof. See Exercise 3.3.4. 


A similar result holds for bounded functions: 


Definition 3.3.5 (Bounded functions). A function f : X — Y from 
one metric space (X,dx) to another (Y,dy) is bounded if f(X) is a 
bounded set, i-e., there exists a ball Bry.a,)(yo, R) in Y such that f(x) € 
Bvy.ay)(yo, ) for all x € X. 


Proposition 3.3.6 (Uniform limits preserve boundedness). Let 
(for) ee be a sequence of functions from one metric space (X,dx) to 
another (Y,dy), and suppose that this sequence converges uniformly to 
another function f : X > Y. If the functions f™ are bounded on X 
for each n, then the limiting function f is also bounded on X. 


Proof. See Exercise 3.3.6. 
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Remark 3.3.7. The above propositions sound very reasonable, but one 
should caution that it only works if one assumes uniform convergence; 
pointwise convergence is not enough. (See Exercises 3.3.3, 3.3.5, 3.3.7.) 


— Exercises — 


Exercise 3.3.1. Prove Theorem 3.3.1. Explain briefly why your proof requires 
uniform convergence, and why pointwise convergence would not suffice. (Hints: 
it is easiest to use the “epsilon-delta” definition of continuity from Definition 
2.1.1. You may find the triangle inequality 


dy (f(), f(w0)) <dv(F(@), £ (w) + dy (F (@), £ (wo) 
+ dy (f (x0), f (a0)) 


useful. Also, you may need to divide ¢ as ¢ = ¢/3 + ¢/3+6¢/3. Finally, it is 
possible to prove Theorem 3.3.1 from Proposition 3.3.3, but you may find it 
easier conceptually to prove Theorem 3.3.1 first.) 


Exercise 3.3.2. Prove Proposition 3.3.3. (Hint: this is very similar to Theorem 
3.3.1. Theorem 3.3.1 cannot be used to prove Proposition 3.3.3, however it is 
possible to use Proposition 3.3.3 to prove Theorem 3.3.1.) 


Exercise 3.3.3. Compare Proposition 3.3.3 with Example 1.2.8. Can you now 
explain why the interchange of limits in Example 1.2.8 led to a false statement, 
whereas the interchange of limits in Proposition 3.3.3 is justified? 


Exercise 3.3.4. Prove Proposition 3.3.4. (Hint: again, this is similar to Theo- 
rem 3.3.1 and Proposition 3.3.3, although the statements are slightly different, 
and one cannot deduce this directly from the other two results.) 


Exercise 3.3.5. Give an example to show that Proposition 3.3.4 fails if the 
phrase “converges uniformly” is replaced by “converges pointwise”. (Hint: 
some of the examples already given earlier will already work here.) 


Exercise 3.3.6. Prove Proposition 3.3.6. Discuss how this proposition differs 
from Exercise 3.2.4. 


Exercise 3.3.7. Give an example to show that Proposition 3.3.6 fails if the 
phrase “converges uniformly” is replaced by “converges pointwise”. (Hint: 
some of the examples already given earlier will already work here.) 


Exercise 3.3.8. Let (X,d) be a metric space, and for every positive integer n, 
let fn : X > Rand g, : X > R be functions. Suppose that (f,)°2, con- 
verges uniformly to another function f : X > R, and that (g,)°, converges 
uniformly to another function g : X — R. Suppose also that the functions 
(fn)e@ and (gn)°2, are uniformly bounded, i.e., there exists an M > 0 such 
that |f,n(x)| < M and |g,(ax)| < M for alln > 1 and x € X. Prove that the 
functions fngn : X — R converge uniformly to fg: X > R. 
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3.4 The metric of uniform convergence 


We have now developed at least four, apparently separate, notions of 
limit in this text: 


(a) limits limp +o a of sequences of points in a metric space (Defi- 
nition 1.1.14; see also Definition 2.5.4); 


(b) limiting values lim;-,2):cex f(x) of functions at a point (Definition 
3.1.1); 


(c) pointwise limits f of functions f(”) (Definition 3.2.1; and 
(d) uniform limits f of functions f(” (Definition 3.2.7). 


This proliferation of limits may seem rather complicated. However, 
we can reduce the complexity slightly by observing that (d) can be 
viewed as a special case of (a), though in doing so it should be cautioned 
that because we are now dealing with functions instead of points, the 
convergence is not in X or in Y, but rather in a new space, the space of 
functions from X to Y. 


Remark 3.4.1. If one is willing to work in topological spaces instead of 
metric spaces, we can also view (b) as a special case of (a), see Exercise 
3.1.4, and (c) is also a special case of (a), see Exercise 3.4.4. Thus the 
notion of convergence in a topological space can be used to unify all the 
notions of limits we have encountered so far. 


Definition 3.4.2 (Metric space of bounded functions). Suppose (X, dx) 
and (Y,dy) are metric spaces. We let B(X — Y) denote the space! of 
bounded functions from X to Y: 


B(X > Y):={f\|f:X — Y is a bounded function}. 
We define a metric dx : B(X > Y) x B(X = Y) > R?* by defining 


deol ig) > BUD Oy I )aKe) Sup ae Lage) eae) 


for all f,g € B(X — Y). This metric is sometimes known as the sup 
norm metric or the L° metric. We will also use dgcx.y) as a synonym 
for doo. 


'Note that this is a set, thanks to the power set axiom (Axiom 3.10) and the axiom 
of specification (Axiom 3.5). 
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Notice that the distance d(f,g) is always finite because f and g 
are assumed to be bounded on X. 


Example 3.4.3. Let X := [0,1] and Y = R. Let f : [0,1] — R and 
g : [0,1] — R be the functions f(x) := 2x” and g(x) := 3a. Then f 
and g are both bounded functions and thus live in B([0,1] + R). The 
distance between them is 


desl} .g)= ‘Sup: |2e32| = sup: |e) =. 
x€(0,1] x€ [0,1] 


This space turns out to be a metric space (Exercise 3.4.1). Conver- 
gence in this metric turns out to be identical to uniform convergence: 


Proposition 3.4.4. Let (X,dx) and (Y,dy) be metric spaces. Let 
(f™)22, be a sequence of functions in B(X -> Y), and let f be an- 
other function in B(X > Y). Then (f(™)2, converges to f in the 


metric dp(x+y) if and only if (f™)°<, converges uniformly to f. 


Proof. See Exercise 3.4.2. 


Now let C(X — Y) be the space of bounded continuous functions 
from X to Y: 


C(x > Y):={f € B(X -Y)/f is continuous}. 


This set C(X — Y) is clearly a subset of B(X — Y). Corollary 
3.3.2 asserts that this space C(X — Y) is closed in B(X — Y) (why?). 
Actually, we can say a lot more: 


Theorem 3.4.5 (The space of continuous functions is complete). Let 
(X,dx) be a metric space, and let (Y,dy) be a complete metric space. 
The space (C(X — Y),da(xsy)le(xsy)xc(xsy)) is a complete sub- 
space of (B(X + Y),dp(xsy)). In other words, every Cauchy sequence 
of functions in C(X + Y) converges to a function in C(X > Y). 


Proof. See Exercise 3.4.3. 


— Exercises — 


Exercise 3.4.1. Let (X,dx) and (Y,dy) be metric spaces. Show that the space 
B(X — Y) defined in Definition 3.4.2, with the metric dgix,y), is indeed a 
metric space. 


Exercise 3.4.2. Prove Proposition 3.4.4. 
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Exercise 3.4.3. Prove Theorem 3.4.5. (Hint: this is similar, but not identical, 
to the proof of Theorem 3.3.1). 


Exercise 3.4.4. Let (X,dx) and (Y,dy) be metric spaces, and let Y* := ff : 


X — Y} be the space of all functions from X to Y (cf. Axiom 3.10). If ao € X 
and V is an open set in Y, let V‘*0) C Y* be the set 


Vo) := {f Ee Y* : f(ao) € VV}. 


If E is a subset of Y*, we say that E is open if for every f € E, there exists a 
finite number of points 71,...,%, € X and open sets Vj,...,V, CG Y such that 


fEVEYI NAVE) CE. 


e Show that if F is the collection of open sets in Y*, then (Y*,F) is a 
topological space. 


e For each natural number n, let f(”) : X — Y be a function from X to 
Y, and let f : X + Y be another function from X to Y. Show that f 
converges to f in the topology F¥ (in the sense of Definition 2.5.4) if and 
only if f(”) converges to f pointwise (in the sense of Definition 3.2.1). 
The topology F is known as the topology of pointwise convergence, for obvious 
reasons; it is also known as the product topology. It shows that the concept 
of pointwise convergence can be viewed as a special case of the more general 
concept of convergence in a topological space. 


3.5 Series of functions; the Weierstrass M-test 


Having discussed sequences of functions, we now discuss infinite series 
yr, fn of functions. Now we shall restrict our attention to functions 
f : X > R from a metric space (X,dx) to the real line R (which we of 
course give the standard metric); this is because we know how to add 
two real numbers, but don’t necessarily know how to add two points in 
a general metric space Y. Functions whose range is R are sometimes 
called real-valued functions. 

Finite summation is, of course, easy: given any finite collection 
f,...,f of functions from X to R, we can define the finite sum 
aol Oa = by 


bs i) (x) = S> (2). 
4=1 1=1 


Example 3.5.1. If f@) : R > R is the function f(x) := x, f® : 
R > R is the function f(x) := x?, and f®) : R > R is the function 
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f@(x) := 23, then f := ae f is the function f : R - R defined by 
f(z) :=a2+2?4+2°. 


It is easy to show that finite sums of bounded functions are bounded, 
and finite sums of continuous functions are continuous (Exercise 3.5.1). 
Now to add infinite series. 


Definition 3.5.2 (Infinite series). Let (X,dx) be a metric space. Let 
(f'"))°2, be a sequence of functions from X to R, and let f be another 
function from X to R. If the partial sums an f™ converge pointwise 
to f on X as N — ov, we say that the infinite series }7°°_, f™ converges 
pointwise to f, and write f = °°, f™. If the partial sums Sy f@™ 
converge uniformly to f on X as N — ov, we say that the infinite series 
pe f™ converges uniformly to f, and again write f = Sar F™, 
(Thus when one sees an expression such as f = )7*°, f (") one should 
look at the context to see in what sense this infinite series converges. ) 


Remark 3.5.3. A series }>>), f (") converges pointwise to f on X if 
and only if 37°, f(a) converges to f(x) for every « € X. (Thus if 
eae: (.) does not converge pointwise to f, this does not mean that it 
diverges pointwise; it may just be that it converges for some points x 
but diverges at other points x.) 


If a series \-°°, f (") converges uniformly to f, then it also converges 
pointwise to f; but not vice versa, as the following example shows. 


Example 3.5.4. Let f( : (—1,1) > R be the sequence of functions 
f™ (a) := x". Then 37°, f™ converges pointwise, but not uniformly, 
to the function 2/(1— 2) (see Exercise 3.2.2 and Example 3.5.8). 


It is not always clear when a series }7*°, f™ converges or not. 
However, there is a very useful test that gives at least one test for uniform 
convergence. 


Definition 3.5.5 (Sup norm). If f : X — R is a bounded real-valued 
function, we define the sup norm ||f||oo of f to be the number 


I[flloo == sup{|f(w)| :@ € X}. 


In other words, || f ||. = doo(f,0), where 0: X — R is the zero function 
O(x) := 0, and d, was defined in Definition 3.4.2. (Why is this the 
case?) 
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Example 3.5.6. Thus, for instance, if f : (—2,1) > R is the function 
f(x) := 22, then ||fllo. = sup{|2z| : x € (—2,1)} = 4 (why?). Notice 
that when f is bounded then ||f||.. will always be a non-negative real 
number. 


Theorem 3.5.7 (Weierstrass M-test). Let (X,d) be a metric space, and 
let (fs be a sequence of bounded real-valued continuous functions 
on X such that the series S~°°, ||f™||oo is convergent. (Note that this 
is a series of plain old real numbers, not of functions.) Then the se- 
igi sap may f™ converges uniformly to some function f on X, and that 


function f is also continuous. 


Proof. See Exercise 3.5.2. 


To put the Weierstrass M-test succinctly: absolute convergence of 
sup norms implies uniform convergence of functions. 


Example 3.5.8. Let 0 <r < 1 beareal number, and let f(”) : [-r,r] > 
R be the series of functions f(")(x) := a”. Then each f(” is continuous 
and bounded, and ||f||,. = r” (why?). Since the series 7°, r” is 
absolutely convergent (e.g., by the ratio test, Theorem 7.5.1), we thus 
see that f( converges uniformly in [—r,r] to some continuous function; 
in Exercise 3.2.2(c) we see that this function must in fact be the function 
f : [-r,r] ~ R defined by f(x) := x/(1-— <x). In other words, the 
series }°°°_, 2” is pointwise convergent, but not uniformly convergent, 

n (—1,1), but is uniformly convergent on the smaller interval [—r, r] for 
anyO<r<l. 


The Weierstrass M-test is especially useful in relation to power se- 
ries, which we will encounter in the next chapter. 


— Exercises — 


Exercise 3.5.1. Let f™,...,f© be a finite sequence of bounded functions 
from a metric space (X,dx) to R. Show that ae f© is also bounded. Prove 
a similar claim when “bounded” is replaced by “continuous”. What if “contin- 
uous” was replaced by “uniformly continuous” ? 


Exercise 3.5.2. Prove Theorem 3.5.7. (Hint: first show that the sequence 
yo, f is a Cauchy sequence in C(X > R). Then use Theorem 3.4.5.) 
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3.6 Uniform convergence and integration 


We now connect uniform convergence with Riemann integration (which 
was discussed in Chapter 11), by showing that uniform limits can be 
safely interchanged with integrals. 


Theorem 3.6.1. Let [a,b] be an interval, and for each integer n > 1, 
let f™ : [a,b] > R be a Riemann-integrable function. Suppose f™) 
converges uniformly on [a,b] to a function f : [a,b] + R. Then f is also 
Riemann integrable, and 


lim <%= he 


arte: [a,b] [a,b] 


Proof. We first show that f is Riemann integrable on [a,b]. This is the 
same as showing that the upper and lower Riemann integrals of f match: 
Saat = Stapf: 

Let ¢ > 0. Since f converges uniformly to f, we see that there 


exists an N > 0 such that |f()(2) — f(x)| < ¢ for alln > N and 
x € [a,b]. In particular we have 


fO(@)—e< fe) <fP(@)+e 


for all x € [a,b]. Integrating this on [a,b] we obtain 


ge = 6) Fa’ < eae = ee ey: 


Since f(") is assumed to be Riemann integrable, we thus see 


fo) - s-a<f sf rs ( #4 b—a). 
UL, aaa Tes [a,b] ie — 


In particular, we see that 


0< = 2 / ag SHH 8) 


Since this is true for every ¢ > 0, we obtain [ ial] i ie b| f as desired. 


The above argument also shows that for every ¢ > 0 there exists an 


N > 0 such that 
[ tO-f i s20-0) 
[a,b] [a,b] 
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for alln > N. Since ¢ is arbitrary, we see that Sia 6] <M 


J ap|/ a8 desired. 


converges to 


To rephrase Theorem 3.6.1: we can rearrange limits and integrals 
(on compact intervals [a, }]), 


lim fo = f lim f™ 
nN—-Ooo [a,b] [a,b] n—-Ooo 


provided that the convergence is uniform. This should be contrasted 
with Example 3.2.5 and Example 1.2.9. 
There is an analogue of this theorem for series: 


Corollary 3.6.2. Let [a,b] be an interval, and let (f(™)2, be a se- 
quence of Riemann integrable functions on [a,b] such that the series 
ey f™ is uniformly convergent. Then we have 


ee oO) 
[a,b] [a,b] » 


This Corollary works particularly well in conjunction with the Weier- 
strass M-test (Theorem 3.5.7): 


Proof. See Exercise 3.6.1. 


Example 3.6.3. (Informal) From Lemma 7.3.3 we have the geometric 


series identity 
[o-e) 
) ‘ii 
n=1 


for x € (—1,1), and the convergence is uniform (by the Weierstrass 
M-test) on [—r,r] for any 0 < r < 1. By adding 1 to both sides we 
obtain 


CO 
) a 
n=0 


and the converge is again uniform. We can thus integrate on [0,r] and 
use Corollary 3.6.2 to obtain 


ae xe” ax = | : dx. 
[or] 1-2 


The left-hand side is 7°29 r”t!/(n +1). If we accept for now the use 
of logarithms (we will justify this use in Section 4.5), the anti-derivative 
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of 1/(1 — x) is —log(1 — x), and so the right-hand side is — log(1 — r). 
We thus obtain the formula 


CO 
—log(l-r) = oy r°tl fn +1) 
n=0 
for alO<r<l. 
— Exercises — 


Exercise 3.6.1. Use Theorem 3.6.1 to prove Corollary 3.6.2. 


3.7 Uniform convergence and derivatives 


We have already seen how uniform convergence interacts well with con- 
tinuity, with limits, and with integrals. Now we investigate how it in- 
teracts with derivatives. 

The first question we can ask is: if f, converges uniformly to f, 
and the functions f, are differentiable, does this imply that f is also 
differentiable? And does f/, also converge to f’? 

The answer to the second question is, unfortunately, no. To 
see a counterexample, we will use without proof some basic facts 
about trigonometric functions (which we will make rigorous in Sec- 
tion 4.7). Consider the functions fp, : [0,27] — R defined by 
fr(x) := n7/*sin(nx), and let f : [0,27] — R be the zero func- 


tion f(x) := 0. Then, since sin takes values between -1 and 1, we 
have doo( fn, f) < n7'/?, where we use the uniform metric doo(f,g) = 
SUPz¢|0,2n] |f (2) — g(x)| introduced in Definintion 3.4.2. Since n—"/2 con- 


verges to 0, we thus see by the squeeze test that f, converges uniformly 
to f. On the other hand, f/(x) = n'/? cos(na), and so in particular 
| f,(0) — f’(0)| = n'/?. Thus f’, does not converge pointwise to f’, and 
so in particular does not converge uniformly either. In particular we 
have 


= tim f,(c) # lim © f,(2). 


dx noo noo dx 
The answer to the first question is also no. An example is the se- 
quence of functions f, : [—1,1] > R defined by f,(a) := + + 2, 


These functions are differentiable (why?). Also, one can easily check 
that 


1 
l2l < f(z) < lal += 
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for all x € [—1,1] (why? square both sides), and so by the squeeze test 
fn converges uniformly to the absolute value function f(x) := |z|. But 
this function is not differentiable at 0 (why?). Thus, the uniform limit 
of differentiable functions need not be differentiable. (See also Example 
1.2.10). 

So, in summary, uniform convergence of the functions f, says nothing 
about the convergence of the derivatives f/,. However, the converse is 
true, as long as f, converges at at least one point: 


Theorem 3.7.1. Let [a,b] be an interval, and for every integer n > 1, let 
fn: [a,b] > R be a differentiable function whose derivative f/, : [a,b] > 
R is continuous. Suppose that the derivatives f!, converge uniformly to 
a function g : [a,b] > R. Suppose also that there exists a point xo such 
that the limit limp. fn(xo) exists. Then the functions f, converge 
uniformly to a differentiable function f, and the derivative of f equals 
g. 

Informally, the above theorem says that if f/, converges uniformly, 
and fn(xo) converges for some Xo, then fy, also converges uniformly, and 
4 Fititig scar Fete) ay # fn(2). 

Proof. We only give the beginning of the proof here; the remainder of 
the proof will be an exercise (Exercise 3.7.1). 


Since f/ is continuous, we see from the fundamental theorem of cal- 
culus (Theorem 11.9.4) that 


Inte) flea) = ff 


when x € [2o,b], and 


flv) fulto)=- ff 


[x,x0] 
when x € [a, Zo]. 
Let L be the limit of fn(xo) as n > oo: 


Li= Jim, fn(2o)- 


By hypothesis, L exists. Now, since each f/ is continuous on [a, b], and 
fi, converges uniformly to g, we see by Corollary 3.3.2 that g is also 
continuous. Now define the function f : [a,b] > R by setting 


fe):=b-f gt] g 
[a,xo] [a,x] 
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for all x € [a,b]. To finish the proof, we have to show that f,, converges 
uniformly to f, and that f is differentiable with derivative g; this shall 
be done in Exercise 3.7.1. 


Remark 3.7.2. It turns out that Theorem 3.7.1 is still true when the 
functions f/ are not assumed to be continuous, but the proof is more 
difficult; see Exercise 3.7.2. 


By combining this theorem with the Weierstrass M-test, we obtain 


Corollary 3.7.3. Let [a,b] be an interval, and for every integer n > 1, 
let fr : [a,b] ~ R be a differentiable function whose derivative f}, : 
[a,b] + R is continuous. Suppose that the series \>°-, || f/,||oo is abso- 
lutely convergent, where 


IIfrlloo = sup |fn(a)| 


x€[a,b] 


is the sup norm of fi, as defined in Definition 3.5.5. Suppose also that 
the series S>>-, fn(xo) is convergent for some xo € [a,b]. Then the 
series \ >>, fn converges uniformly on [a,b] to a differentiable function, 
and in fact 


for all x € [a, }J. 


Proof. See Exercise 3.7.3. 


We now pause to give an example of a function which is continu- 
ous everywhere, but differentiable nowhere (this particular example was 
discovered by Weierstrass). Again, we will presume knowledge of the 
trigonometric functions, which will be covered rigorously in Section 4.7. 


Example 3.7.4. Let f: R— R be the function 
fe= y A-" ¢os(32" 12). 
n=1 


Note that this series is uniformly convergent, thanks to the Weierstrass 
M-test, and since each individual function 4~” cos(32"72) is continuous, 
the function f is also continuous. However, it is not differentiable (Ex- 
ercise 4.7.10); in fact it is a nowhere differentiable function, one which 
is not differentiable at any point, despite being continuous everywhere! 
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— Exercises — 


Exercise 3.7.1. Complete the proof of Theorem 3.7.1. Compare this theorem 
with Example 1.2.10, and explain why this example does not contradict the 
theorem. 


Exercise 3.7.2. Prove Theorem 3.7.1 without assuming that f/ is continuous. 
(This means that you cannot use the fundamental theorem of calculus. How- 
ever, the mean value theorem (Corollary 10.2.9) is still available. Use this to 


show that if doo(fis fm) < €, then |(fn(«) — fm(2)) — (fn(t0) — fm(£o))| S 
€|% — Xo| for all x € [a,b], and then use this to complete the proof of Theorem 


3.7.1.) 
Exercise 3.7.3. Prove Corollary 3.7.3. 


3.8 Uniform approximation by polynomials 


As we have just seen, continuous functions can be very badly behaved, 
for instance they can be nowhere differentiable (Example 3.7.4). On 
the other hand, functions such as polynomials are always very well be- 
haved, in particular being always differentiable. Fortunately, while most 
continuous functions are not as well behaved as polynomials, they can 
always be uniformly approximated by polynomials; this important (but 
difficult) result is known as the Weierstrass approximation theorem, and 
is the subject of this section. 


Definition 3.8.1. Let [a,b] be an interval. A polynomial on [a,}] is a 
function f : [a,b] + R of the form f(x) := 775 c;xI, where n > 0 is 
an integer and co,...,C, are real numbers. If c, 4 0, then n is called 
the degree of f. 


Example 3.8.2. The function f : [1,2] + R defined by f(x) := 3x4 + 
2x23 — 4x + 5 is a polynomial on [1,2] of degree 4. 


Theorem 3.8.3 (Weierstrass approximation theorem). Jf [a,b] is an 
interval, f : [a,b] > R is a continuous function, and « > 0, then there 
exists a polynomial P on [a,b] such that dx(P,f) < € (ie, |P(x) — 
f(x)| <e for all x € a, 6)). 


Another way of stating this theorem is as follows. Recall that 
C({a,b] + R) was the space of continuous functions from [a,b] to R, 
with the uniform metric d.. Let P({a,b] — R) be the space of all 
polynomials on [a,b]; this is a subspace of C([a,b] + R), since all poly- 
nomials are continuous (Exercise 9.4.7). The Weierstrass approximation 
theorem then asserts that every continuous function is an adherent point 
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of P({a, b] + R); or in other words, that the closure of the space of poly- 
nomials is the space of continuous functions: 


P(a, }] > R) = C({a,b] > R). 


In particular, every continuous function on [a,b] is the uniform limit of 
polynomials. Another way of saying this is that the space of polynomials 
is dense in the space of continuous functions, in the uniform topology. 

The proof of the Weierstrass approximation theorem is somewhat 
complicated and will be done in stages. We first need the notion of an 
approximation to the identity. 


Definition 3.8.4 (Compactly supported functions). Let [a,b] be an 
interval. A function f : R —> R is said to be supported on [a,b] iff 
f(x) = 0 for all x ¢ [a,b]. We say that f is compactly supported iff it is 
supported on some interval [a,b]. If f is continuous and supported on 
[a, b], we define the improper integral [°° f to be [°° f := Sia.b} f. 


Note that a function can be supported on more than one interval, 
for instance a function which is supported on [3,4] is also automati- 
cally supported on [2,5] (why?). In principle, this might mean that our 
definition of i ee f is not well defined, however this is not the case: 


Lemma 3.8.5. If f : RR — R is continuous and supported on an interval 
[a,b], and is also supported on another interval [c,d], then ie b] f= 


Sica F 
Proof. See Exercise 3.8.1. 


Definition 3.8.6 (Approximation to the identity). Let « > 0 and 0 < 
6 <1. A function f : R > R is said to be an (€,6)-approximation to 
the identity if it obeys the following three properties: 


(a) f is supported on [—1,1], and f(x) > 0 for all -l1 <a <1. 
(b) f is continuous, and f° f = 1. 
(c) |f(x)| < € for all 6 < |az| <1. 


Remark 3.8.7. For those of you who are familiar with the Dirac delta 
function, approximations to the identity are ways to approximate this 
(very discontinuous) delta function by a continuous function (which is 
easier to analyze). We will not however discuss the Dirac delta function 
in this text. 
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Our proof of the Weierstrass approximation theorem relies on three 
key facts. The first fact is that polynomials can be approximations to 
the identity: 


Lemma 3.8.8 (Polynomials can approximate the identity). For every 
e>0 and0 <6 <1 there exists an (€,6)-approximation to the identity 
which is a polynomial P on {[—1, 1]. 


Proof. See Exercise 3.8.2. 


We will use these polynomial approximations to the identity to ap- 
proximate continuous functions by polynomials. We will need the fol- 
lowing important notion of a convolution. 


Definition 3.8.9 (Convolution). Let f: R ~ Randg:R-— R be 
continuous, compactly supported functions. We define the convolution 
f*xg:R-R of f and g to be the function 


(f+ (a) = f “Fea ae: 


Note that if f and g are continuous and compactly supported, then 
for each x the function f(y)g(a — y) (thought of as a function of y) is 
also continuous and compactly supported, so the above definition makes 
sense. 


Remark 3.8.10. Convolutions play an important role in Fourier anal- 
ysis and in partial differential equations (PDE), and are also important 
in physics, engineering, and signal processing. An in-depth study of 
convolution is beyond the scope of this text; only a brief treatment will 
be given here. 


Proposition 3.8.11 (Basic properties of convolution). Let f : R > 
R,g:R—-R, andh: R > R be continuous, compactly supported 
functions. Then the following statements are true. 


(a) The convolution f * g is also a continuous, compactly supported 
function. 


(b) (Convolution is commutative) We have fxg = gf; in other words 
fate) =f foe -v) dy 


= i aly) f(a — y) dy 
=g* f(a). 
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(c) (Convolution is linear) We have f *(g+h)=fxgtfx*h. Also, 
for any real number c, we have f * (cg) = (cf) *g =c(f *g). 


Proof. See Exercise 3.8.4. 


Remark 3.8.12. There are many other important properties of con- 
volution, for instance it is associative, (f * g)*h = f * (g*h), and it 
commutes with derivatives, (f « g)’ = f/*g = f *g’, when f and g 
are differentiable. The Dirac delta function 6 mentioned earlier is an 
identity for convolution: f «6 = 6* f = f. These results are slightly 
harder to prove than the ones in Proposition 3.8.11, however, and we 
will not need them in this text. 


As mentioned earlier, the proof of the Weierstrass approximation 
theorem relies on three facts. The second key fact is that convolution 
with polynomials produces another polynomial: 


Lemma 3.8.13. Let f: R— R be a continuous function supported on 
[0,1], and letg: RR be a continuous function supported on [-1, 1] 
which is a polynomial on [—1,1]. Then f * g is a polynomial on [0,1]. 
(Note however that it may be non-polynomial outside of [0, 1].) 

Proof. Since g is polynomial on [—1,1], we may find an integer n > 0 
and real numbers co, ¢1,..., Cp, such that 


£) = Soe for all x € [—1, 1]. 
j=0 


On the other hand, for all x € [0,1], we have 
feo= f seale—v) d= f seate—v) ay 
0,1 


since f is supported on [0,1]. Since x € [0, 1] and the variable of integra- 
tion y is also in [0,1], we have x — y € [—1,1]. Thus we may substitute 
in our formula for g to obtain 


n 


fx g(x) = i f(y) do e(a—y) dy. 


j=0 


We expand this using the binomial formula (Exercise 7.1.4) to obtain 


* IHF dy. 
f * g(a) =f i nde Sata (—y)* dy 
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We can interchange the two summations (by Corollary 7.1.14) to obtain 


f«9(x) = fe Dee cg pit (Cr a 


k=0 j=k 


(why did the limits of summation change? It may help to plot j and k 
on a graph). Now we interchange the k summation with the integral, 
and observe that x” is independent of y, to obtain 


fxg(a at cj; — > (-y)” dy. 
=>: 00 erg 
If we thus define 
C =) FU) > ie a 
k (0.3 ( p> 7G —b)! ) 


for each k = 0,...,n, then Cy is a number which is independent of x, 


and we have 
f* g(@ “> Cra 


for all x € [0,1]. Thus f * g is a polynomial on [0,1]. 


The third key fact is that if one convolves a uniformly continuous 
function with an approximation to the identity, we obtain a new function 
which is close to the original function (which explains the terminology 
“approximation to the identity” ): 


Lemma 3.8.14. Let f: R— R be a continuous function supported on 
[0,1], which is bounded by some M > 0 (i.e., |f(x)| < M for alla € R), 
and lete > 0 and 0 < 6 < 1 be such that one has |f(x) — f(y)| < € 
whenever x,y € R and |x—y| < 6. Let g be any (€,6)-approximation to 
the identity. Then we have 


If * g(x) — f(@)| < 1+ 4M)e 


for all x € [0,1]. 


Proof. See Exercise 3.8.6. 
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Combining these together, we obtain a preliminary version of the 
Weierstrass approximation theorem: 


Corollary 3.8.15 (Weierstrass approximation theorem I). Let f : R > 
R be a continuous function supported on [0,1]. Then for every « > 0, 
there exists a function P:R > R which is polynomial on [0,1] and such 
that |P(x) — f(x)| < e for all x € (0, 1]. 


Proof. See Exercise 3.8.7. 


Now we perform a series of modifications to convert Corollary 3.8.15 
into the actual Weierstrass approximation theorem. We first need a 
simple lemma. 


Lemma 3.8.16. Let f : [0,1] — R be a continuous function which 
equals 0 on the boundary of [0,1], i.e., f(0) = f(1) =0. Let F: RO 
R be the function defined by setting F(x) := f(x) for x € [0,1] and 
F(x) :=0 for x ¢ [0,1]. Then F is also continuous. 


Proof. See Exercise 3.8.9. 


Remark 3.8.17. The function F’ obtained in Lemma 3.8.16 is some- 
times known as the extension of f by zero. 


From Corollary 3.8.15 and Lemma 3.8.16 we immediately obtain 


Corollary 3.8.18 (Weierstrass approximation theorem II). Let f : 
[0,1] > R be a continuous function supported on [0,1] such that 
f(0) = f(1) = 0. Then for every ¢ > 0 there exists a polynomial 
P: [0,1] > R such that |P(x) — f(x)| <e for all x € [0,1]. 


Now we strengthen Corollary 3.8.18 by removing the assumption 


that f(0) = f(1) =0. 


Corollary 3.8.19 (Weierstrass approximation theorem III). Let f : 
[0,1] > R be a continuous function supported on [0,1]. Then for every 
€ > 0 there exists a polynomial P : [0,1] > R such that |P(«)—f(x)| <«€ 
for all x € [0,1]. 


Proof. Let F’: [0,1] — R denote the function 


F(a) = f(#) — f(0) — #(f(1) — f(0)). 
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Observe that F' is also continuous (why?), and that F(0) = F(1) = 0. 
By Corollary 3.8.18, we can thus find a polynomial Q : [0,1] + R such 
that |Q(xz) — F(x)| < « for all x € [0,1]. But 


Q(x) — F(x) = Q(x) + FO) + af) — F(0)) — F(a), 


so the claim follows by setting P to be the polynomial P(x) := Q(x) + 
f(0) + e(f(1) — f(0)). 


Finally, we can prove the full Weierstrass approximation theorem. 


Proof of Theorem 3.8.8. Let f : [a,b] + R be a continuous function on 
[a,b]. Let g : [0,1] + R denote the function 


g(x) := f(a+(b—a)x) for all x € [0,1] 
Observe then that 
f(y) = 9((y — a)/(0 — a)) for all y € [a, B]. 


The function g is continuous on [0,1] (why?), and so by Corollary 3.8.19 
we may find a polynomial Q : [0,1] > R such that |Q(x) — g(x)| < é for 
all x € [0,1]. In particular, for any y € [a,b], we have 


|Q((y — a)/(6 — a)) — g((y — a)/(b—a))| Se. 


If we thus set P(y) := Q((y—a)/(b—a)), then we observe that P is also 
a polynomial (why?), and so we have |P(y) — f(y)| < ¢ for all y € [a, 5], 
as desired. 


Remark 3.8.20. Note that the Weierstrass approximation theorem 
only works on bounded intervals [a, b]; continuous functions on R. cannot 
be uniformly approximated by polynomials. For instance, the exponen- 
tial function f : R — R defined by f(a) := e” (which we shall study 
rigorously in Section 4.5) cannot be approximated by any polynomial, 
because exponential functions grow faster than any polynomial (Exer- 
cise 4.5.9) and so there is no way one can even make the sup metric 
between f and a polynomial finite. 


Remark 3.8.21. There is a generalization of the Weierstrass approx- 
imation theorem to higher dimensions: if K is any compact subset of 
R” (with the Euclidean metric dj2), and f : kK — R is a continuous 
function, then for every ¢ > 0 there exists a polynomial P: kK > R 
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of n variables 71,...,2, such that d.(f,P) < ¢. This general theorem 
can be proven by a more complicated variant of the arguments here, 
but we will not do so. (There is in fact an even more general version 
of this theorem applicable to an arbitrary metric space, known as the 
Stone-Weierstrass theorem, but this is beyond the scope of this text.) 


— Exercises — 
Exercise 3.8.1. Prove Lemma 3.8.5. 


Exercise 3.8.2. (a) Prove that for any real number 0 < y < 1 and any nat- 
ural number n > 0, that (1— y)” > 1-— ny. (Hint: induct on n. Alter- 
natively, differentiate with respect to y.) 

Show that f',Q-2)" dz > We (Hint: for |z| < 1/./n, use part (a); for 
|x| > 1/./n, just use the fact that (1 — 2?) is positive. It is also possible 
to proceed via trigonometric substitution, but I would not recommend 
this unless you know what you are doing.) 


— 
og 
wm 


(c) Prove Lemma 3.8.8. (Hint: choose f(x) to equal c(1—a?)% for x € [-1, 1] 
and to equal zero for « ¢ [—1,1], where N is a large number N, where c 
is chosen so that f has integral 1, and use (b).) 


Exercise 3.8.3. Let f : R — R be a compactly supported, continuous function. 
Show that f is bounded and uniformly continuous. (Hint: the idea is to use 
Proposition 2.3.2 and Theorem 2.3.5, but one must first deal with the issue 
that the domain R of f is non-compact.) 


Exercise 3.8.4. Prove Proposition 3.8.11. (Hint: to show that f*g is continuous, 
use Exercise 3.8.3.) 


Exercise 3.8.5. Let f : R ~ Rand g: R — R be continuous, compactly 

supported functions. Suppose that f is supported on the interval [0,1], and g 

is constant on the interval [0, 2] (i.e., there is a real number c such that g(x) = c 

for all a € [0,2]). Show that the convolution f * g is constant on the interval 

(1, 2]. 

Exercise 3.8.6. (a) Let g be an (¢,6) approximation to the identity. Show 
that 1 —2e < Ji-3.3] g<l. 


(b) Prove Lemma 3.8.14. (Hint: begin with the identity 


feg(e)= i fle —y)g(y) dy = i fle —yg(u) dy 


[-4,6] 


= f(x —y)g(y) dy + / f(x —y)g(y) dy. 


[5,1] [-1,-6] 


The idea is to show that the first integral is close to f(x), and that the 
second and third integrals are very small. To achieve the former task, 
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use (a) and the fact that f(x) and f(a — y) are within ¢ of each other; 
to achieve the latter task, use property (c) of the approximation to the 
identity and the fact that f is bounded.) 
Exercise 3.8.7. Prove Corollary 3.8.15. (Hint: combine Exercise 3.8.3, Lemma 
3.8.8, Lemma 3.8.13, and Lemma 3.8.14.) 
Exercise 3.8.8. Let f : [0,1] ~ R be a continuous function, and suppose that 
tio y f(x)” dx = 0 for all non-negative integers n = 0,1,2,.... Show that f 
must be the zero function f = 0. (Hint: first show that Sio.ay f(a)P(x) dx =0 
for all polynomials P. Then, using the Weierstrass approximation theorem, 
show that So, f(x) f(x) dx = 0.) 


Exercise 3.8.9. Prove Lemma 3.8.16. 


Chapter 4 


Power series 


4.1 Formal power series 


We now discuss an important subclass of series of functions, that of 
power series. As in earlier chapters, we begin by introducing the notion 
of a formal power series, and then focus in later sections on when the 
series converges to a meaningful function, and what one can say about 
the function obtained in this manner. 


Definition 4.1.1 (Formal power series). Let a be a real number. A 
formal power series centered at a is any series of the form 


as Cn (a — a)” 
n=0 


where co, C1,... is a sequence of real numbers (not depending on x); we 
refer to cn as the n*” coefficient of this series. Note that each term 
Cn(a — a)” in this series is a function of a real variable x. 


Example 4.1.2. The series }°7° 9 n!(a — 2)” is a formal power series 
centered at 2. The series }*°° ) 2"(x — 3)” is not a formal power series, 
since the coefficients 2” depend on x. 


We call these power series formal because we do not yet assume that 
these series converge for any x. However, these series are automatically 
guaranteed to converge when x« = a (why?). In general, the closer x 
gets to a, the easier it is for this series to converge. To make this more 
precise, we need the following definition. 


Definition 4.1.3 (Radius of convergence). Let 7°.) ¢,(a — a)" be a 
formal power series. We define the radius of convergence R of this series 
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to be the quantity 
1 


~ lim SUPn— oo len|¥/” 


where we adopt the convention that 7 = +00 and a3 = 0. 


Remark 4.1.4. Each number |c,|!/" is non-negative, so the limit 
lim sup,,_so6 |€n|!/" can take on any value from 0 to +00, inclusive. Thus 
R can also take on any value between 0 and +00 inclusive (in particular 
it is not necessarily a real number). Note that the radius of convergence 
always exists, even if the sequence lex |e ” is not convergent, because the 
lim sup of any sequence always exists (though it might be +oo or —oo). 


Example 4.1.5. The series }>?°_) n(—2)"(a — 3)” has radius of conver- 


gence 
1 1 


lim supp, soo [n(—2")[7" ~ Timsup,, yo, 2nt/e ~ 2" 


The series }>°° 5 2”* (x +2)" has radius of convergence 


a 1 1 
— ; —— =a 0. 
lim SUPn— 00 ee lim SUD yoo" +00 
The series 77° 5 2-™ (x + 2)" has radius of convergence 
1 1 1 
E Leb a =" = = +00. 
lim SUPp_+o0 |2-” | /n Tim SUPpyoo 2 0 


The significance of the radius of convergence is the following. 


Theorem 4.1.6. Let \--° 9 ¢n(a— a)” be a formal power series, and let 
R be its radius of convergence. 


(a) (Divergence outside of the radius of convergence) If x € R is such 
that |x — a| > R, then the series S~° 9 cn(x — a)” is divergent for 
that value of x. 


(b) (Convergence inside the radius of convergence) If x € R is such 
that |x — a| < R, then the series \°P° 9 c¢n(x — a)” is absolutely 
convergent for that value of x. 


For parts (c)-(e) we assume that R > 0 (i.e., the series converges at 
at least one other point than x =a). Let f : (a—R,a+ R) > R be the 
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function f(x) := S09 (x — a)"; this function is guaranteed to exist 


by (b). 


(c) (Uniform convergence on compact sets) For any0 <r < R, the 
series \ >? 9 Cn(a — a)” converges uniformly to f on the compact 
interval |a—r,at+r]. In particular, f is continuous on (a—R,a+R). 


(d) (Differentiation of power series) The function f is differentiable 
on (a—R,a+R), and for anyO<r< R, the series Sr, nen(a— 
a)"—! converges uniformly to f' on the interval [a—r,a+r]. 


(e) (Integration of power series) For any closed interval [y,z| con- 
tained in (a— R,a+ R), we have 


n+l _ ( n+1 


_ -4@) y—4@) 
f f= de n+l 


Proof. See Exercise 4.1.1. 


Theorem 4.1.6 (a) and (b) of the above theorem give another way to 
find the radius of convergence, by using your favorite convergence test 
to work out the range of x for which the power series converges: 


Example 4.1.7. Consider the power series }°°° 9) n(x — 1)". The ratio 
test shows that this series converges when |x — 1| < 1 and diverges 
when |x — 1| > 1 (why?). Thus the only possible value for the radius 
of convergence is R = 1 (if R < 1, then we have contradicted Theorem 
4.1.6(a); if R > 1, then we have contradicted Theorem 4.1.6(b)). 


Remark 4.1.8. Theorem 4.1.6 is silent on what happens when |x —a| = 
R, ie., at the points a— R and a+ R. Indeed, one can have either 
convergence or divergence at those points; see Exercise 4.1.2. 


Remark 4.1.9. Note that while Theorem 4.1.6 assures us that the 
power series )0°° 4 ¢n(x — a)” will converge pointwise on the interval 
(a — R,a + R), it need not converge uniformly on that interval (see 
Exercise 4.1.2(e)). On the other hand, Theorem 4.1.6(c) assures us that 
the power series will converge on any smaller interval [a —r,a+r]. In 
particular, being uniformly convergent on every closed subinterval of 
(a — R,a + R) is not enough to guarantee being uniformly convergent 
on all of (a— R,a+ R). 
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— Exercises — 


Exercise 4.1.1. Prove Theorem 4.1.6. (Hints: for (a) and (b), use the root test 
(Theorem 7.5.1). For (c), use the Weierstrass M-test (Theorem 3.5.7). For (d), 
use Theorem 3.7.1. For (e), use Corollary 3.6.2. 


Exercise 4.1.2. Give examples of a formal power series )>>° 9 cnx” centered at 
0 with radius of convergence 1, which 


(a) diverges at both # = 1 and x = —1; 


(b) diverges at « = 1 but converges at x = —1; 

(c) converges at x = 1 but diverges at x = —1; 

(d) converges at both x = 1 and x = -1. 

(e) converges pointwise on (—1,1), but does not converge uniformly on 


(14), 


4.2 Real analytic functions 


A function f(a) which is lucky enough to be representable as a power 
series has a special name; it is a real analytic function. 


Definition 4.2.1 (Real analytic functions). Let E be a subset of R, 
and let f: E > R be a function. If a is an interior point of EF, we say 
that f is real analytic at a if there exists an open interval (a—r,a+r) in 
E for some r > 0 such that there exists a power series }>7° 9 ¢n(x — a)” 
centered at a which has a radius of convergence greater than or equal to 
r, and which converges to f on (a—r,a+r). If E is an open set, and 
f is real analytic at every point a of E, we say that f is real analytic 
on E. 


Example 4.2.2. Consider the function f : R\{1} — R defined by 
f(x) :=1/(1— <2). This function is real analytic at 0 because we have 
a power series )7°°,2” centred at 0 which converges to 1/(1 — x) = 
f(x) on the interval (—1,1). This function is also real analytic at 2 
because we have a power series )>°°_)(—1)"*!(x—2)" which converges to 
oa = >& = f(a) on the interval (1,3) (why? use Lemma 7.3.3). 


In fact this function is real analytic on all of R\{1}; see Exercise 4.2.2. 


Remark 4.2.3. The notion of being real analytic is closely related to 
another notion, that of being complex analytic, but this is a topic for 
complex analysis, and will not be discussed here. 
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We now discuss which functions are real analytic. From Theorem 
4.1.6(c) and (d) we see that if f is real analytic at a point a, then f is 
both continuous and differentiable on (a —r,a+r) for some r > 0. We 
can in fact say more: 


Definition 4.2.4 (k-times differentiability). Let E be a subset of R. We 
say a function f : E > Ris once differentiable on E iff it is differentiable. 
More generally, for any k > 2 we say that f : EF > Ris k times 
differentiable on E, or just k times differentiable, iff f is differentiable, 
and f’ is k — 1 times differentiable. If f is k times differentiable, we 
define the k’” derivative f\) : E + R by the recursive rule f{) := f", 
and f(*) = (f*&-)) for all k > 2. We also define f := f (this is 
f differentiated 0 times), and we allow every function to be zero times 
differentiable (since clearly f) exists for every f). A function is said to 
be infinitely differentiable (or smooth) iff it is k times differentiable for 
every k > 0. 


Example 4.2.5. The function f(x) := |a|® is twice differentiable on 
R, but not three times differentiable (why?). Indeed, f@) = f” = 6|a\, 
which is not differentiable, at 0. 


Proposition 4.2.6 (Real analytic functions are k-times differentiable). 
Let E be a subset of R, let a be an interior point of E, and and let f be 
a function which is real analytic at a, thus there is an r > 0 for which 
we have the power series expansion 


fa= Ss" Cn (a — a)” 
n=0 


for allx € (a—r,a+r). Then for every k > 0, the function f is k-times 
differentiable on (a—r,a+r), and for each k > 0 the k" derivative is 
given by 


fe) = So enan(n + 1)(n+2)...(n-+k)(@— a)” 


n=0 

= n+k)! o 
= Yen ea) 

n=O nN: 


for allx € (a—r,a+tr). 


Proof. See Exercise 4.2.3. 
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Corollary 4.2.7 (Real analytic functions are infinitely differentiable). 
Let E be an open subset of R, and let f : E > R be a real analytic func- 
tion on E. Then f is infinitely differentiable on E. Also, all derivatives 
of f are also real analytic on E. 


Proof. For every point a € EF and k > 0, we know from Proposition 
4.2.6 that f is k-times differentiable at a (we will have to apply Exercise 
10.1.1 & times here, why?). Thus f is k-times differentiable on FE for 
every k > 0 and is hence infinitely differentiable. Also, from Proposition 
4.2.6 we see that each derivative f*) of f has a convergent power series 
expansion at every x € E and thus f*) is real analytic. 


Example 4.2.8. Consider the function f : R — R defined by f(x) := 
|x|. This function is not differentiable at « = 0, and hence cannot be 


real analytic at x = 0. It is however real analytic at every other point 
x € R\{0} (why?). 


Remark 4.2.9. The converse statement to Corollary 4.2.7 is not true; 
there are infinitely differentiable functions which are not real analytic. 
See Exercise 4.5.4. 


Proposition 4.2.6 has an important corollary, due to Brook Taylor 
(1685-1731). 


Corollary 4.2.10 (Taylor’s formula). Let E be a subset of R, let a be 
an interior point of E, and let f : E > R be a function which is real 
analytic at a and has the power series expansion 


f(x) = So en(a — a)" 
n=0 


for allx € (a—r,a+r) and some r > 0. Then for any integer k > 0, 
we have 
f(a) = keg, 


where k! :=1x2*x...x k (and we adopt the convention that 0! = 1). 
In particular, we have Taylor’s formula 


2 (nm) (4 
fa) = Ee —ayn 
n=0 : 


for allz in (a—r,a+r). 
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Proof. See Exercise 4.2.4. 


F (n) : : 
The power series >>> 9 f (a) (x—a)” is sometimes called the Taylor 


series of f around a. Taylor’s formula thus asserts that if a function is 
real analytic, then it is equal to its Taylor series. 


Remark 4.2.11. Note that Taylor’s formula only works for functions 
which are real analytic; there are examples of functions which are in- 
finitely differentiable but for which Taylor’s theorem fails (see Exercise 
4.5.4). 

Another important corollary of Taylor’s formula is that a real ana- 
lytic function can have at most one power series at a point: 
Corollary 4.2.12 (Uniqueness of power series). Let F be a subset of R, 
let a be an interior point of E, and let f : E > R be a function which 
is real analytic at a. Suppose that f has two power series expansions 


fas a@sa! 
n=0 
and ss 
(a= s) Ges? 
n=0 


centered at a, ecah with a non-zero radius of convergence. Then Cp = dyn 
for alln > 0. 

Proof. By Corollary 4.2.10, we have f(a) = kc, for all k > 0. But 
we also have f*)(a) = k!dg, by similar reasoning. Since k! is never zero, 
we can cancel it and obtain c, = dx for all k > 0, as desired. 


Remark 4.2.13. While a real analytic function has a unique power 
series around any given point, it can certainly have different power series 
at different points. For instance, the function f(x) := z+, defined on 
R — {1}, has the power series 


fe) = o> a 
n=0 


around 0, on the interval (—1,1), but also has the power series 
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around 1/2, on the interval (0,1) (note that the above power series has 
a radius of convergence of 1/2, thanks to the root test. See also Exercise 
4.2.8. 


— Exercises — 


Exercise 4.2.1. Let n > 0 be an integer, let c,a be real numbers, and let f be 
the function f(a) := c(a — a)”. Show that f is infinitely differentiable, and 
that f(x) = Coty (a —a)"—* for all integers 0 < k < n. What happens 
when k > n? 


Exercise 4.2.2. Show that the function f defined in Example 4.2.2 is real ana- 
lytic on all of R\{1}. 


Exercise 4.2.3. Prove Proposition 4.2.6. (Hint: induct on k and use Theorem 
4.1.6(d)). 


Exercise 4.2.4. Use Proposition 4.2.6 and Exercise 4.2.1 to prove Corollary 
4.2.10. 


Exercise 4.2.5. Let a,b be real numbers, and let n > 0 be an integer. Prove 
the identity 


(e-a)"= >> ig gym(n—5)™ 


~, m!(n — m)! 


for any real number x. (Hint: use the binomial formula, Exercise 7.1.4.) Ex- 
plain why this identity is consistent with Taylor’s theorem and Exercise 4.2.1. 
(Note however that Taylor’s theorem cannot be rigorously applied until one 
verifies Exercise 4.2.6 below.) 


Exercise 4.2.6. Using Exercise 4.2.5, show that every polynomial P(a) of one 
variable is real analytic on R. 


Exercise 4.2.7. Let m > 0 be a positive integer, and let 0 < x < r be real 
numbers. Use Lemma 7.3.3 to establish the identity 


oe) 
r = 

=) erro” 
r—Z@ 


n=0 


for all a € (—r,r). Using Proposition 4.2.6, conclude the identity 


Co 


r x n! Pai hs 
(r—g)mtl me mi(n—m)! 


for all integers m > 0 and a € (—r,r). Also explain why the series on the 
right-hand side is absolutely convergent. 
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Exercise 4.2.8. Let E be a subset of R, let a be an interior point of FE, and let 
f :&—-—R be a function which is real analytic in a, and has a power series 
expansion 


Co 
F(a) =) en(w — a)" 
n=0 
at a which converges on the interval (a —r,a+7r). Let (b—s,b+ 5) be any 
sub-interval of (a —r,a+r) for some s > 0. 
(a) Prove that |a — b| < r— s, so in particular |a — b| <r. 


(b) Show that for every 0 < € < r, there exists a C > 0 such that |c,| < 
C(r —e)~” for all integers n > 0. (Hint: what do we know about the 
radius of convergence of the series )>7° 9 ¢n(a — a)”?) 


(c) Show that the numbers do, d1,... given by the formula 


te | 
ages » Amo —a)”" cy for all integers m > 0 


are well-defined, in the sense that the above series is absolutely conver- 
gent. (Hint: use (b) and the comparison test, Corollary 7.3.2, followed 
by Exercise 4.2.7.) 


(d) Show that for every 0 < € < s there exists a C > 0 such that 
ldm| < C(s—€)~™ 


for all integers m > 0. (Hint: use the comparison test, and Exercise 
4.2.7.) 


(e) Show that the power series 7 4 dm(a — 6)” is absolutely convergent 
for « € (b—s,b+-8) and converges to f(x). (You may need Fubini’s 
theorem for infinite series, Theorem 8.2.2, as well as Exercise 4.2.5). 


(f) Conclude that f is real analytic at every point in (a—r,a+r). 


4.3 Abel’s theorem 


Let f(x) = 7°96 cn(x—a)” be a power series centered at a with a radius 
of convergence 0 < R < o strictly between 0 and infinity. From Theo- 
rem 4.1.6 we know that the power series converges absolutely whenever 
|x — a| < R, and diverges when |x — a| > R. However, at the boundary 
|x —a| = R the situation is more complicated; the series may either con- 
verge or diverge (see Exercise 4.1.2). However, if the series does converge 
at the boundary point, then it is reasonably well behaved; in particular, 
it is continuous at that boundary point. 
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Theorem 4.3.1 (Abel’s theorem). Let f(a) = S772) en(x — a)” be a 
power series centered at a with radius of convergence 0 < R < oo. If the 
power series converges ata+ R, then f is continuous ata+ R, i.e. 


ae doe x—a)” = Deak 
(a—R,a+R) — 


aat+R:a 


Similarly, if the power series converges ata — R, then f is continuous 
ata— R, te. 


ue dente a = Yel-n 
(a—R,a+R = 


aa—R:a 


Before we prove Abel’s theorem, we need the following lemma. 


Lemma 4.3.2 (Summation by parts formula). Let (an)?2.9 and (bn)? 9 
be sequences of real numbers which converge to limits A and B respec- 
tively, i.e., limp—+yoo An = A and limn+. bn = B. Suppose that the sum 
Po (Gn+1 — Gn)bn is convergent. Then the sum Y>?- 4 An+1(bn+1 — bn) 
is also convergent, and 


So (an41 _ An) bn, = AB — agbo — S- an+1(bn41 _ bn). 
n=0 n=0 


Proof. See Exercise 4.3.1. 


Remark 4.3.3. One should compare this formula with the more well- 
known integration by parts formula 


| ” pi(a)g(2) de = f(x)g(a) |g — | * p(a)gl(a) ax 
0 0 


see Proposition 11.10.1. 


Proof of Abel’s theorem. It will suffice to prove the first claim, i.e., that 


ae as een = Soa 


ara+R:a 


whenever the sum }°>°° 9 ¢nR” converges; the second claim will then 
follow (why?) by replacing c, by (—1)"cp in the above claim. If we 
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make the substitutions d, := c,R” and y := a then the above claim 
can be rewritten as 


a = Yt” = Dod 


yr lye ( 


whenever the sum }*°° 9d, converges. (Why is this equivalent to the 
previous claim?) 
Write D := S7°° 9 dn, and for every N > 0 write 


(3 4)- 


so in particular So = —D. Then observe that limy_5.. Sy = 0, and that 
dn = Sn+1 — Sn. Thus for any y € (—1,1) we have 


3 day” 256 n+1 — Sn jy: 


n=0 


Applying the summation by parts formula (Lemma 4.3.2), and noting 
that limy +o y” = 0, we obtain 


[o-e) (oe) 
So day” = —Soy? — 50 Snsily?t* — y”). 
n=0 n=0 


Observe that. —Soy° = +.D. Thus to finish the proof of Abel’s theorem, 
it will suffice to show that 


Foe Snaily n+1 _ y”) =) 


eee 
Since y converges to 1, we may as well restrict y to [0,1) instead of 


(—1,1); in particular we may take y to be positive. 
From the triangle inequality for series (Proposition 7.2.9), we have 


(gr 


=U) Ss Say 9") 
n=0 


=" |Snsal(y" — y"*4), 
n=0 
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so by the squeeze test (Corollary 6.4.14) it suffices to show that 


lim a Snaal(y” —y"*) =0. 


y—1:yE[0,1) 


ot) 


The expression 37°? 9 |Sn4il(y" —y is clearly non-negative, so it will 


suffice to show that 


lim sup i Srial@”—y"*) = 0. 
y > 1-yE[0,1) p=0 


Let « > 0. Since S,, converges to 0, there exists an N such that |S,,| < 
for alln > N. Thus we have 


oo 
: |Sr-al(y yrtt) < S Sal (y = aes ae Ss" e(y” en arty: 
n=0 n=N+1 


The last summation is a telescoping series, which sums to ey +! (See 
Lemma 7.2.15, recalling from Lemma 6.5.2 that y” > 0 as n > oo), and 
thus 


N 
Yo SeslG” PS) Nl te. 


Now take limits as y + 1. Observe that y” — y”*! > 0 as y > 1 for 
every n € 0,1,...,N. Since we can interchange limits and finite sums 
(Exercise 7.1.5), we thus have 


lim sup 3 Snail”? =o") «. 


N00 7 


But ¢ > 0 was arbitrary, and thus we must have 


lim sup 3 [Snail(y” — yt") =0 


N00. 


since the left-hand side must be non-negative. The claim follows. 


— Exercises — 


Exercise 4.3.1. Prove Lemma 4.3.2. (Hint: first work out the relationship 
between the partial sums Dae (ee —G@n)bp and ee An41(bn41 — bn).) 
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4.4 Multiplication of power series 


We now show that the product of two real analytic functions is again 
real analytic. 


Theorem 4.4.1. Let f: (a—r,a+r)—> Randg:(a-r,at+r) 9>R 
be functions analytic on (a—r,a+r), with power series expansions 


fe\= Ss" Cn (a — a)” 
n=0 


and 


ge) = So dala — a)" 
n=0 


respectively. Then fg: (a—r,at+r) > R is also analytic on (a—r,a+r), 
with power series expansion 


f(x)g(z) = >) en(x — a)” 
n=0 


where ey3= 5 en aawas 


Remark 4.4.2. The sequence (e,)°2. is sometimes referred to as the 
convolution of the sequences (Cp)P29 and (dn)°29; it is closely related 
(though not identical) to the notion of convolution introduced in Defi- 
nition 3.8.9. 


Proof. We have to show that the series )7?° 4 e,(a — a)" converges to 
f(x)g(x) for all € (a—r,at+r). Now fix x to be any point in (a—r,a+r). 
By Theorem 4.1.6, we see that both f and g have radii of convergence at 
least r. In particular, the series )>7° 9 cn(a — a)” and $7? 9 dn(x — a)” 
are absolutely convergent. Thus if we define 


CS yy |Cn(a — a)”| 
n=0 


and 


DS Ss" |dn(a — a)”| 
n=0 


then C and D are both finite. 
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For any N > 0, consider the partial sum 


N ow 
S> 5 lem(a — a)™dn(x — a)"|. 


n=0 m=0 


We can rewrite this as 


N oo 
> Idn(e — a)"| 57 lem(e - 2)", 
n=0 m=0 


which by definition of C' is equal to 


N 
Y= |dn(x — a)" IC, 
n=0 


which by definition of D is less than or equal to DC. Thus the above 
partial sums are bounded by DC for every N. In particular, the series 


32> Jem(a — 4)""dy (ax — a)"| 


n=0 m=0 


is convergent, which means that the sum 


Ss" es Cm(a — a)" dy (x — a)” 


n=0 m=0 


is absolutely convergent. 
Let us now compute this sum in two ways. First of all, we can pull 
the d,(a — a)" factor out of the )>°°_, summation, to obtain 


(oe) 


dy(x — a)” Ss" Cm(a — a)™. 
0 


n= m=0 


By our formula for f(x), this is equal to 
do an(e — a)" f (2); 
n=0 
by our formula for g(x), this is equal to f(x)g(x). Thus 


F(x)g(@) = D7 >) em(x — a)" dn(x — a)”. 


n=0 m=0 
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Now we compute this sum in a different way. We rewrite it as 


f(x) 2S adecoe™ 


n=0 m=0 


By Fubini’s theorem for series (Theorem 8.2.2), because the series was 
absolutely convergent, we may rewrite it as 


f(x) Soy ane 


m=0 n=0 


Now make the substitution n’ := n+ ™, to rewrite this as 


— . . Cm On! —m(x — a)” 
aes 


m=0 n/=m 


If we adopt the convention that d; = 0 for all negative 7, then this is 


equal to 
(ies aeseaot 


m=0 n/=0 


Applying Fubini’s theorem again, we obtain 


Se (ga), 


n’=0 m=0 


which we can rewrite as 


f(x)g(e) = Soe =a)" YS ena 
n'=0 


Since d; was 0 when j is negative, we can rewrite this as 


which by definition of e is 


lee) 
=Sie (a — a)” 


as desired. 
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4.5 The exponential and logarithm functions 


We can now use the machinery developed in the last few sections to de- 
velop a rigorous foundation for many standard functions used in math- 
ematics. We begin with the exponential function. 


Definition 4.5.1 (Exponential function). For every real number x, we 
define the exponential function exp(x) to be the real number 


exp(x) := SS = 


n=0 
Theorem 4.5.2 (Basic properties of exponential). 


(a) For every real number x, the series ~°° 9 4 is absolutely conver- 

gent. In particular, exp(x) exists and is real for every x € R, the 
power series \ 9 cae has an infinite radius of convergence, and 
exp is a real analytic function on (—oo, 00). 


(b) exp is differentiable on R, and for every x € R, exp’(x) = exp(z). 


(c) exp is continuous on R, and for every interval [a,b], we have 
Sia.t) exp(x) dx = exp(b) — exp(a). 


(d) For every x,y € R, we have exp(x + y) = exp(z) exp(y). 


(e) We have exp(0) = 1. Also, for every x € R, exp(x) is positive, 
and exp(—x) = 1/exp(z). 


(f) exp is strictly monotone increasing: in other words, if x,y are real 
numbers, then we have exp(y) > exp(x) if and only if y > x. 


Proof. See Exercise 4.5.1. 


One can write the exponential function in a more compact form, 
introducing famous Euler’s number e = 2.71828183..., also known as 
the base of the natural logarithm: 


Definition 4.5.3 (Euler’s number). The number e is defined to be 
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Proposition 4.5.4. For every real number x, we have exp(x) = e”. 


Proof. See Exercise 4.5.3. 


In light of this proposition we can and will use e” and exp(x) inter- 
changeably. 

Since e > 1 (why?), we see that e® + +00 as x — +00, and e” > 0 
as x — —oo. From this and the intermediate value theorem (Theorem 
9.7.1) we see that the range of the function exp is (0,co). Since exp is 
increasing, it is injective, and hence exp is a bijection from R to (0, 00), 
and thus has an inverse from (0,0o) + R. This inverse has a name: 


Definition 4.5.5 (Logarithm). We define the natural logarithm function 
log : (0,00) > R (also called In) to be the inverse of the exponential 
function. Thus exp(log(x)) = x and log(exp(z)) = a. 


Since exp is continuous and strictly monotone increasing, we see that 
log is also continuous and strictly monotone increasing (see Proposition 
9.8.3). Since exp is also differentiable, and the derivative is never zero, 
we see from the inverse function theorem (Theorem 10.4.2) that log is 
also differentiable. We list some other properties of the natural logarithm 
below. 

Theorem 4.5.6 (Logarithm properties). 

(a) For every x € (0,00), we have In'(x) = 4. In particular, by the 
fundamental theorem of calculus, we have Sia.b} 1 dx = \n(b) —In(a) 
for any interval |a, b] in (0,00). 


(b) We have In(xy) = In(x) + In(y) for all x,y € (0,00). 
(c) We have In(1) = 0 and In(1/x) = —In(2) for all x € (0,00). 
(d) For any x € (0,00) andy € R, we have In(x¥) = yln(z). 
(e) For any x € (—1,1), we have 
[oe) gn 
In(1 = : 
n(1 — x) ps 3 
In particular, In is analytic at 1, with the power series expansion 
oo as] n+l 
lng SS eae — 1)” 
n=1 


for x € (0,2), with radius of convergence 1. 
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Proof. See Exercise 4.5.5. 


Example 4.5.7. We now give a modest application of Abel’s the- 
orem (Theorem 4.3.1): from the alternating series test we see that 


ya. a is convergent. By Abel’s theorem we thus see that 


n=1 


y eles (x — 1)” 


x2 
thus we have the formula 
1 1 1 1 
In(2)=1 | 
nl?) an wan 
— Exercises — 


Exercise 4.5.1. Prove Theorem 4.5.2. (Hints: for part (a), use the ratio test. 
For parts (bc), use Theorem 4.1.6. For part (d), use Theorem 4.4.1. For part 
(e), use part (d). For part (f), use part (d), and prove that exp(x) > 1 when x 
is positive. You may find the binomial formula from Exercise 7.1.4 to be useful. 


Exercise 4.5.2. Show that for every integer n > 3, we have 


ee a he oh 0 
(n+)! (n+2) °° Ta 


(Hint: first show that (n +k)! > 2*n! for all k = 1,2,3,....) Conclude that nle 
is not an integer for every n > 3. Deduce from this that e is irrational. (Hint: 
prove by contradiction.) 


Exercise 4.5.3. Prove Proposition 4.5.4. (Hint: first prove the claim when x 
is a natural number. Then prove it when z is an integer. Then prove it when 
x is a rational number. Then use the fact that real numbers are the limits of 
rational numbers to prove it for all real numbers. You may find the exponent 
laws (Proposition 6.7.3) to be useful.) 


Exercise 4.5.4. Let f : R > R be the function defined by setting f(x) := 
exp(—1/x) when x > 0, and f(x) := 0 when x < 0. Prove that f is infinitely 
differentiable, and f)(0) = 0 for every integer k > 0, but that f is not real 
analytic at 0. 


Exercise 4.5.5. Prove Theorem 4.5.6. (Hints: for part (a), use the inverse 
function theorem (Theorem 10.4.2) or the chain rule (Theorem 10.1.15). For 
parts (bcd), use Theorem 4.5.2 and the exponent laws (Proposition 6.7.3). For 
part (e), start with the geometric series formula (Lemma 7.3.3) and integrate 
using Theorem 4.1.6). 
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Exercise 4.5.6. Prove that the natural logarithm function is real analytic on 
(0, +00). 

Exercise 4.5.7. Let f : R > (0,co) be a positive, real analytic function such 
that f’(x) = f(x) for all a € R. Show that f(x) = Ce® for some positive 
constant C; justify your reasoning. (Hint: there are basically three different 
proofs available. One proof uses the logarithm function, another proof uses the 
function e~”, and a third proof uses power series. Of course, you only need to 
supply one proof.) 


Exercise 4.5.8. Let m > 0 be an integer. Show that 


ev 
lim — = +00. 

@—++oo y™ 
(Hint: what happens to the ratio between e?*1!/(x + 1)™ and e?/z™ as x > 
+00?) 
Exercise 4.5.9. Let P(x) be a polynomial, and let c > 0. Show that there 
exists a real number N > 0 such that e°? > |P(a)| for all « > N; thus an 
exponentially growing function, no matter how small the growth rate c, will 
eventually overtake any given polynomial P(x), no matter how large. (Hint: 
use Exercise 4.5.8.) 
Exercise 4.5.10. Let f : (0,+00) x R — R be the exponential function 
f(z,y) := «¥. Show that f is continuous. (Hint: note that Propositions 
9.4.10, 9.4.11 only show that f is continuous in each variable, which is in- 
sufficient, as Exercise 2.2.11 shows. The easiest way to proceed is to write 
f(x,y) = exp(ylnz) and use the continuity of exp() and In(). For an extra 
challenge, try proving this exercise without using the logarithm function.) 


4.6 A digression on complex numbers 


To proceed further we need the complex number system C, which is an 
extension of the real number system R. A full discussion of this im- 
portant number system (and in particular the branch of mathematics 
known as complex analysis) is beyond the scope of this text; here, we 
need the system primarily because of a very useful mathematical oper- 
ation, the complex exponential function z ++ exp(z), which generalizes 
the real exponential function x +> exp(x) introduced in the previous 
section. 
Informally, we could define the complex numbers as 


Definition 4.6.1 (Informal definition of complex numbers). The com- 
plex numbers C are the set of all numbers of the form a+ bi, where a, b 
are real numbers and i is a square root of —1, i? = —1. 
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However, this definition is a little unsatisfactory as it does not explain 
how to add, multiply, or compare two complex numbers. To construct 
the complex numbers rigorously we will first introduce a formal version 
of the complex number a+bi, which we shall temporarily denote as (a, b); 
this is similar to how in Chapter 4, when constructing the integers Z, 
we needed a formal notion of subtraction a—b before the actual notion 
of subtraction a — b could be introduced, or how when constructing the 
rational numbers, a formal notion of division a//b was needed before it 
was superceded by the actual notion a/b of division. It is also similar to 
how, in the construction of the real numbers, we defined a formal limit 
LIMy_ +50 Gn, before we defined a genuine limit limy_.o9 an. 


Definition 4.6.2 (Formal definition of complex numbers). A complex 
number is any pair of the form (a, b), where a, b are real numbers, thus for 
instance (2,4) is a complex number. Two complex numbers (a, b), (c, d) 
are said to be equal iff a = c and b = d, thus for instance (2+1,3+4) = 
(3,7), but (2,1) 4 (1,2) and (2,4) 4 (2,—4). The set of all complex 
numbers is denoted C. 


At this stage the complex numbers C are indistinguishable from the 
Cartesian product R? = R x R (also known as the Cartesian plane). 
However, we will introduce a number of operations on the complex num- 
bers, notably that of complex multiplication, which are not normally 
placed on the Cartesian plane R?. Thus one can think of the complex 
number system C as the Cartesian plane R? equipped with a number of 
additional structures. We begin with the notion of addition and nega- 
tion. Using the informal definition of the complex numbers, we expect 


(a,b) + (c,d) = (a+ bi) + (c+ dt) = (at+c)+(b4+ d)i = (at+c,b+d) 
and similarly 
—(a,b) = —(a + bi) = (—a) + (—b)i = (—a, —D). 


As these derivations used the informal definition of the complex num- 
bers, these identities have not yet been rigorously proven. However we 
shall simply encode these identities into our complex number system by 
defining the notion of addition and negation by the above rules: 


Definition 4.6.3 (Complex addition, negation, and zero). If z = (a,b) 
and w = (c,d) are two complex numbers, we define their sum z+ w 
to be the complex number z+ w := (a+c,b+d). Thus for instance 
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(2,4) + (3,1) = (5,3). We also define the negation —z of z to be the 
complex number —z := (—a,—b), thus for instance —(3,—1) = (—3, 1). 
We also define the complex zero 0c to be the complex number 0c = 
(0,0). 


It is easy to see that notion of addition is well-defined in the sense 
that if z = 2’ and w = w’ then z+w = 2’ +w’. Similarly for negation. 
The complex addition, negation, and zero operations obey the usual laws 
of arithmetic: 


Lemma 4.6.4 (The complex numbers are an additive group). Jf 
21, 22,23 are complex numbers, then we have the commutative property 
zy +29 = 294-21, the associative property (21 + 22) + 23 = 21+ (22 + 23), 
the identity property z1 + 0c = 0c + 2 = 21, and the inverse property 
z+ ( ea 21) + z1 = Oc. 


Proof. See Exercise 4.6.1. 


Next, we define the notion of complex multiplication and reciprocal. 
The informal justification of the complex multiplication rule is 


(a,b) - (c,d) = (a+ bi)(c + di) 
=ac+ adi + bic + bidi 
= (ac — bd) + (ad + bc)i 
= (ac — bd, ad + bc) 


2 


since i“ is supposed to equal —1. Thus we define 


Definition 4.6.5 (Complex multiplication). If z = (a,b) and w = (c,d) 
are complex numbers, then we define their product zw to be the complex 
number zw := (ac— bd,ad+ bc). We also introduce the complex identity 
lo := (1,0). 


This operation is easily seen to be well-defined, and also obeys the 
usual laws of arithmetic: 


Lemma 4.6.6. Jf 21, 22,23 are complex numbers, then we have the 
commutative property 2122 = 2221, the associative property (z122)z3 = 
21(2223), the identity property z11c = len = 21, and the distributivity 
properties 21(z2 + 23) = 2122 + 2123 and (zg + 23)21 = 2221 + 2321. 


Proof. See Exercise 4.6.2. 
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The above lemma can also be stated more succinctly, as the assertion 
that C is a commutative ring. As is usual, we now write z — w as 
shorthand for z+ (—w). 

We now identify the real numbers R with a subset of the complex 
numbers C by identifying any real number x with the complex number 
(z,0), thus x = (2,0). Note that this identification is consistent with 
equality (thus « = y iff (2,0) = (y,0)), with addition (#1 + x2 = x3 iff 
(21,0) + (a2,0) = (#3,0)), with negation (x = —y iff (x,0) = —(y,0)), 
and multiplication (#122 = x3 iff (x1,0)(x2,0) = (x3,0)), so we will 
no longer need to distinguish between “real addition” and “complex 
addition”, and similarly for equality, negation, and multiplication. For 
instance, we can compute 3(2,4) by identifying the real number 3 with 
the complex number (3,0) and then computing (3,0)(2,4) = (3 x 2 — 
0x 4,3 x 4+0 x 2) = (6,12). Note also that 0 = 0c and 1 = 1c, so we 
can now drop the C subscripts from the zero 0 and the identity 1. 

We now define i to be the complex number 7 := (0,1). We can now 
reconstruct the informal definition of the complex numbers as a lemma: 


Lemma 4.6.7. Every complex number z € C can be written as z = a+bi 


for exactly one pair a,b of real numbers. Also, we have i2 = —1, and 


—z=(-1l)z. 


Proof. See Exercise 4.6.3. 


Because of this lemma, we will now refer to complex numbers in 
the more usual notation a + bi, and discard the formal notation (a, b) 
henceforth. 


Definition 4.6.8 (Real and imaginary parts). If z is a complex number 
with the representation z = a+ bi for some real numbers a, b, we shall 
call a the real part of z and denote R(z) := a, and call b the imaginary 
part of z and denote S(z) := b, thus for instance #(3 + 47) = 3 and 
3(3 + 47) = 4, and in general z = R(z) + i9(z). Note that z is real iff 
S(z) = 0. We say that z is imaginary iff R(z) = 0, thus for instance 47 
is imaginary, while 3 + 47 is neither real nor imaginary, and 0 is both 
real and imaginary. We define the complex conjugate Z of z to be the 
complex number Z := #(z) — i3(z), thus for instance 34+ 47 = 3 — 41, 


i= —t, and 3=3. 


The operation of complex conjugation has several nice properties: 


Lemma 4.6.9 (Complex conjugation is an involution). Let z,w be com- 
plex numbers, then z+ w= Z+W, —zZ = —Z, andz7W=ZW. AlsoZ = z. 
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Finally, we have Z = Ww if and only if z = w, and Z = z if and only if z 
is real. 


Proof. See Exercise 4.6.4. 


The notion of absolute value |x| was defined for rational numbers x 
in Definition 4.3.1, and this definition extends to real numbers in the 
obvious manner. However, we cannot extend this definition directly to 
the complex numbers, as most complex numbers are neither positive 
nor negative. (For instance, we do not classify i as either a positive or 
negative number; see Exercise 4.6.15 for some reasons why). However, 
we can still define absolute value by generalizing the formula || = Vx? 
from Exercise 5.6.3: 


Definition 4.6.10 (Complex absolute value). If z = a+ bi is a complex 
number, we define the absolute value |z| of z to be the real number 


|z| = Va? +b? = (a? + b?)1/2, 


From Exercise 5.6.3 we see that this notion of absolute value general- 
izes the notion of real absolute value. The absolute value has a number 
of other good properties: 


Lemma 4.6.11 (Properties of complex absolute value). Let z,w be com- 
plex numbers. Then |z| is a non-negative real number, and |z| = 0 if and 
only if z =0. Also we have the identity zz = |z\?, and so |z| = V/zz. As 


a consequence we have |zw| = |z||w| and |Z| = |z|. Finally, we have the 
inequalities 
—lz] <R(z) < lz; —lae] < Sz) Slab lal S [R(Z)] +18) 


as well as the triangle inequality |z + w| < |z| + |w|. 


Proof. See Exercise 4.6.6. 


Using the notion of absolute value, we can define a notion of recip- 
rocal: 


Definition 4.6.12 (Complex reciprocal). If z is a non-zero complex 
number, we define the reciprocal z~! of z to be the complex number 
z+ := |z|~?z (note that |z|~? is well-defined as a positive real number 
because |z| is positive real, thanks to Lemma 4.6.11). Thus for instance 
(1+ 2i)~! = |1 + 2¢|-?(1 — 2%) = (1? + 2?) “11 — 2%) = § — 24. If z is 
zero, z = 0, we leave the reciprocal 0~! undefined. 
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From the definition and Lemma 4.6.11, we see that 


gap BS lees er Se 


and so z~1 is indeed the reciprocal of z. We can thus define a notion of 
quotient z/w for any two complex numbers z, w with w 4 0 in the usual 
manner by the formula z/w := zw7t. 

The complex numbers can be given a distance by defining d(z, w) = 
|z— w. 


Lemma 4.6.13. The complex numbers C with the distance d form a 
metric space. If (Zn)°2, is a sequence of complex numbers, and z is 
another complex number, then we have limp + 2, = z in this metric 
space if and only if limn—yoo R(Zn) = R(z) and limps S(2n) = F(z). 


Proof. See Exercise 4.6.9. 


This metric space is in fact complete and connected, but not com- 
pact: see Exercises 4.6.10, 4.6.12, 4.6.13. We also have the usual limit 
laws: 


Lemma 4.6.14 (Complex limit laws). Let (zn)?24 and (wn)?2, be con- 
vergent sequences of complex numbers, and let c be a complex number. 
Then the sequences (zn + Wn)e21, (Zn — Wn)e1, (c2n)P1, (2nwn)P1, 
and (%m)°, are also convergent, with 


lim zp, + Wy, = lim z,+ lim wy, 
n-Cco noo n> oo 
lim Zp, — Wy, = lim z, — lim wy, 
n> co n—-oo n> oo 


lim cz, =c lim Zp 
nN Cco nN—- Ooo 


lim Z,Wp = ( lim ze) ( lim wn) 
noo n—- oo noo 
lim Z, = lim Zp 
noo n—- Ooo 


Also, if the wn are all non-zero and limy.9. Wn is also non-zero, then 
(zn/Wn)°@, is also a convergent sequence, with 


Jim, n/m = (im, 20)/( im, tn) 


Proof. See Exercise 4.6.14. 
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Observe that the real and complex number systems are in fact quite 
similar; they both obey similar laws of arithmetic, and they have sim- 
ilar structure as metric spaces. Indeed many of the results in this 
textbook that were proven for real-valued functions, are also valid for 
complex-valued functions, simply by replacing “real” with “complex” in 
the proofs but otherwise leaving all the other details of the proof un- 
changed. Alternatively, one can always split a complex-valued function 
f into real and imaginary parts R(f), S(f), thus f = R(f) +iS(f), and 
then deduce results for the complex-valued function f from the corre- 
sponding results for the real-valued functions R(f), S(f). For instance, 
the theory of pointwise and uniform convergence from Chapter 3, or the 
theory of power series from this chapter, extends without any difficulty 
to complex-valued functions. In particular, we can define the complex 
exponential function in exactly the same manner as for real numbers: 


Definition 4.6.15 (Complex exponential). If z is a complex number, 
we define the function exp(z) by the formula 


exp(z) := Ss" = 


n=0 


One can state and prove the ratio test for complex series and 
use it to show that exp(z) converges for every z. It turns out that 
many of the properties from Theorem 4.5.2 still hold: we have that 
exp(z + w) = exp(z)exp(w), for instance; see Exercise 4.6.16. (The 
other properties require complex differentiation and complex integra- 
tion, but these topics are beyond the scope of this text.) Another useful 
observation is that exp(z) = exp(Z); this can be seen by conjugating the 
partial sums ss a and taking limits as N — oo. 

The complex logarithm turns out to be somewhat more subtle, 
mainly because exp is no longer invertible, and also because the various 
power series for the logarithm only have a finite radius of convergence 
(unlike exp, which has an infinite radius of convergence). This rather 
delicate issue is beyond the scope of this text and will not be discussed 
here. 


— Exercises — 
Exercise 4.6.1. Prove Lemma 4.6.4. 
Exercise 4.6.2. Prove Lemma 4.6.6. 


Exercise 4.6.3. Prove Lemma 4.6.7. 
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Exercise 4.6.4. Prove Lemma 4.6.9. 


Exercise 4.6.5. If z is a complex number, show that R(z) = 4% and S(z) = 


Z-Z 
21° 


Exercise 4.6.6. Prove Lemma 4.6.11. (Hint: to prove the triangle inequality, 
first prove that R(zw) < |z||w], and hence (from Exercise 4.6.5) that zw+Zw < 
2|z||w|. Then add |z|? + |w|? to both sides of this inequality.) 


Exercise 4.6.7. Show that if z, w are complex numbers with w 4 0, then |z/w| = 
|z|/lw. 
Exercise 4.6.8. Let z,w be non-zero complex numbers. Show that |z + w| = 


|z| + |w| if and only if there exists a positive real number c > 0 such that 
zZ=cw. 


Exercise 4.6.9. Prove Lemma 4.6.13. 


Exercise 4.6.10. Show that the complex numbers C (with the usual metric d) 
form a complete metric space. 


Exercise 4.6.11. Let f :R? > C be the map f(a, b) := a+ bi. Show that f is 
a bijection, and that f and f~! are both continuous maps. 


Exercise 4.6.12. Show that the complex numbers C (with the usual metric d) 
form a connected metric space. (Hint: first show that C is path connected, as 
in Exercise 2.4.7.) 


Exercise 4.6.13. Let E be a subset of C. Show that FE is compact if and only if 
E is closed and bounded. (Hint: combine Exercise 4.6.11 with the Heine-Borel 
theorem, Theorem 1.5.7.) In particular, show that C is not compact. 


Exercise 4.6.14. Prove Lemma 4.6.14. (Hint: split z, and w, into real and 
imaginary parts and use the usual limit laws, Lemma 6.1.19, combined with 
Lemma 4.6.13.) 


Exercise 4.6.15. The purpose of this exercise is to explain why we do not try to 
organize the complex numbers into positive and negative parts. Suppose that 
there was a notion of a “positive complex number” and a “negative complex 
number” which obeyed the following reasonable axioms (cf. Proposition 4.2.9): 


e (Trichotomy) For every complex number z, exactly one of the following 
statements is true: z is positive, z is negative, z is zero. 


e (Negation) If z is a positive complex number, then —z is negative. If z 
is a negative complex number, then —z is positive. 


e (Additivity) If z and w are positive complex numbers, then z+ w is also 
positive. 


e (Multiplicativity) If z and w are positive complex numbers, then zw is 
also positive. 
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Show that these four axioms are inconsistent, i.e., one can use these axioms 
to deduce a contradiction. (Hints: first use the axioms to deduce that 1 is 
positive, and then conclude that —1 is negative. Then apply the Trichotomy 
axiom to z =i and obtain a contradiction in any one of the three cases). 
Exercise 4.6.16. Prove the ratio test for complex series, and use it to show 
that the series used to define the complex exponential is absolutely convergent. 
Then prove that exp(z + w) = exp(z) exp(w) for all complex numbers z, w. 


4.7 Trigonometric functions 


We now discuss the next most important class of special functions, af- 
ter the exponential and logarithmic functions, namely the trigonometric 
functions. (There are several other useful special functions in mathemat- 
ics, such as the hyperbolic trigonometric functions and hypergeometric 
functions, the gamma and zeta functions, and elliptic functions, but they 
occur more rarely and will not be discussed here.) 

Trigonometric functions are often defined using geometric concepts, 
notably those of circles, triangles, and angles. However, it is also possi- 
ble to define them using more analytic concepts, and in particular the 
(complex) exponential function. 


Definition 4.7.1 (Trigonometric functions). If z is a complex number, 
then we define 


el + ev 
cos(z) := —3— 
and ' 
; e% — e7 
sin(z) := oF 


We refer to cos and sin as the cosine and sine functions respectively. 


These formulae were discovered by Leonhard Euler (1707-1783) in 
1748, who recognized the link between the complex exponential and the 
trigonometric functions. Note that since we have defined the sine and 
cosine for complex numbers z, we automatically have defined them also 
for real numbers x. In fact in most applications one is only interested 
in the trigonometric functions when applied to real numbers. 

From the power series definition of exp, we have 


2 223 4 
ae ; z wz 
e~=1+2z 1 ar rae 
and , 2 5 
reeves sb 2 
2! 3! 4) 
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and so from the above formulae we have 


2 4 oe n,2n 
Gos oe (—1) 
cos(z) = 1 at at = ys (an)! 
n=0 
and 
2 2 oe e antl 
sin(z) =2- +5 aor ea 
In particular, cos(x) and sin(x) are always real whenever « is real. 


_4)n,2n 
From the ratio test we see that the two power series )>?° , ao, 


n~2n+1 
pee, ae are absolutely convergent for every x, thus sin(x) and 


cos(z) are real analytic at 0 with an infinite radius of convergence. From 
Exercise 4.2.8 we thus see that the sine and cosine functions are real an- 
alytic on all of R. (They are also complex analytic on all of C, but 
we will not pursue this matter in this text). In particular the sine and 
cosine functions are continuous and differentiable. 

We list some basic properties of the sine and cosine functions below. 


Theorem 4.7.2 (Trigonometric identities). Let x,y be real numbers. 


(a) We have sin(x)? + cos(x)? = 1. In particular, we have sin(x) € 
[—1,1] and cos(x) € [-1,1] for alla eR. 

(b) We have sin'(x) = cos(x) and cos’(x) = —sin(z). 

(c) We have sin(—x) = —sin(x) and cos(—2x) = cos(z). 


(d) We have cos(x+ y) = cos(x) cos(y) —sin(x) sin(y) and sin(a+y) = 
sin(a) cos(y) + cos(2) sin(y). 


(e) We have sin(0) = 0 and cos(0) = 1. 


(f) We have e** = cos(x) + isin(x) and e~*” = cos(x) — isin(x). In 
particular cos(x) = R(e’”) and sin(«) = S(e"). 


Proof. See Exercise 4.7.1. 
Now we describe some other properties of sin and cos. 


Lemma 4.7.3. There exists a positive number x such that sin(x) is 
equal to 0. 
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Proof. Suppose for sake of contradiction that sin(#) #4 0 for all x € 
(0,00). Observe that this would also imply that cos(x) 4 0 for all 
x € (0,00), since if cos(x) = 0 then sin(27) = 0 by Theorem 4.7.2(d) 
(why?). Since cos(0) = 1, this implies by the intermediate value theorem 
(Theorem 9.7.1) that cos(x) > 0 for all > 0 (why?). Also, since 
sin(0) = 0 and sin’(0) = 1 > 0, we see that sin increasing near 0, hence 
is positive to the right of 0. By the intermediate value theorem again 
we conclude that sin(x) > 0 for all x > 0 (otherwise sin would have a 
zero on (0, 00)). 

In particular if we define the cotangent function cot(z) := 
cos(x)/sin(x), then cot(x) would be positive and differentiable on all 
of (0,00). From the quotient rule (Theorem 10.1.13(h)) and Theorem 
4.7.2 we see that the derivative of cot(x) is —1/sin(x)? (why?) In par- 
ticular, we have cot/(x) < —1 for all x > 0. By the fundamental theorem 
of calculus (Theorem 11.9.1) this implies that cot(z + s) < cot(x) — s 
for all z > 0 and s > 0. But letting s > oo we see that this contradicts 
our assertion that cot is positive on (0,00) (why?). 


Let E be the set FE := {x € (0,+00) : sin(x) = 0}, ie., E is the 
set of roots of sin on (0,+00). By Lemma 4.7.3, EF is non-empty. Since 
sin’(0) > 0, there exists a c > 0 such that EF’ C [c,+00) (see Exercise 
4.7.2). Also, since sin is continuous in [c,+00), F is closed in [c, +00) 
(why? use Theorem 2.1.5(d)). Since [c, +00) is closed in R, we conclude 
that & is closed in R. Thus EF contains all its adherent points, and thus 
contains inf(£). Thus if we make the definition 


Definition 4.7.4. We define 7 to be the number 
n := inf{ax € (0,00) : sin(x) = 0} 


then we have 7 € E C [c,+00) (so in particular 7 > 0) and sin(z) = 
0. By definition of 7, sin cannot have any zeroes in (0,7), and so in 
particular must be positive on (0,7), (cf. the arguments in Lemma 4.7.3 
using the intermediate value theorem). Since cos’ (a) = — sin(z), we thus 
conclude that cos(x) is strictly decreasing on (0,7). Since cos(0) = 1, 
this implies in particular that cos(7) < 1; since sin?(7) + cos?(m) = 1 
and sin(7) = 0, we thus conclude that cos(7) = —1. 
In particular we have Euler’s famous formula 


e™ — cos(m) +isin(r) = —1. 


We now conclude with some other properties of sine and cosine. 
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Theorem 4.7.5 (Periodicity of trigonometric functions). Let x be a real 
number. 


(a) We have cos(x + 7) = —cos(x) and sin(a + 7) = —sin(x). In 
particular we have cos(a + 27) = cos(x) and sin(a + 27) = sin(z), 
i.e., sin and cos are periodic with period 27. 


(b) We have sin(x) = 0 if and only if x/x is an integer. 


(c) We have cos(x) = 0 if and only if x/m is an integer plus 1/2. 


Proof. See Exercise 4.7.3. 


We can of course define all the other trigonometric functions: tan- 
gent, cotangent, secant, and cosecant, and develop all the familiar iden- 
tities of trigonometry; some examples of this are given in the exercises. 


— Exercises — 


Exercise 4.7.1. Prove Theorem 4.7.2. (Hint: write everything in terms of ex- 
ponentials whenever possible.) 


Exercise 4.7.2. Let f : R — R be a function which is differentiable at xo, with 
f(zo) = 0 and f’(ao) #4 0. Show that there exists a c > 0 such that f(y) is 
non-zero whenever 0 < | — y| < c. Conclude in particular that there exists a 
c > 0 such that sin(x) £0 for allO <a <e. 


Exercise 4.7.3. Prove Theorem 4.7.5. (Hint: for (c), you may wish to first 
compute sin(7/2) and cos(7/2), and then link cos(x) to sin(a + 7/2).) 


Exercise 4.7.4. Let x,y be real numbers such that 27+ y? = 1. Show that there 
is exactly one real number 6 € (—7,7] such that 2 = sin(@) and y = cos(@). 
(Hint: you may need to divide into cases depending on whether «, y are positive, 
negative, or zero.) 


Exercise 4.7.5. Show that if r,s > 0 are positive real numbers, and 6, a are real 
numbers such that re’? = se’®, then r = s and 6 = a+2rk for some integer k. 


Exercise 4.7.6. Let z be a non-zero complex number. Using Exercise 4.7.4, show 
that there is exactly one pair of real numbers r,6 such that r > 0, @ € (—7, 7], 


and z = re’’. (This is sometimes known as the standard polar representation 
of z.) 


Exercise 4.7.7. For any real number @ and integer n, prove the de Moivre 
identities 


cos(n#) = R((cosA+isin9)”);  sin(n#) = S((cos# + isin @)”). 
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Exercise 4.7.8. Let tan : (—7/2,7/2) — R be the tangent function 


tan(x) := sin(«)/cos(z). Show that tan is differentiable and monotone in- 
creasing, with “ tan(x) = 1+ tan(z)?, and that lim,,,/2tan(x) = +00 
and lim; _,_,/2 tan(x) = —oo. Conclude that tan is in fact a bijection from 


(—7/2,7/2) + R, and thus has an inverse function tan7! : R > (—7/2, 7/2) 
(this function is called the arctangent function). Show that tan~+ is differen- 
tiable and 4 tan7!(x) = zh. 
Exercise 4.7.9. Recall the arctangent function tan~! from Exercise 4.7.8. By 
modifying the proof of Theorem 4.5.6(e), establish the identity 


OF (SDeerrt 


aif Ay 
ee a 


n=0 


for all « € (—1,1). Using Abel’s theorem (Theorem 4.3.1) to extend this 
identity to the case x = 1, conclude in particular the identity 


dt _ 
mad Steno... =a 


(Note that the series converges by the alternating series test, Proposition 
7.2.12). Conclude in particular that 4-4 <a <4. (One can of course compute 
m = 3.1415926... to much higher accuracy, though if one wishes to do so it is 
advisable to use a different formula than the one above, which converges very 
slowly). 


Exercise 4.7.10. Let f : R—- R be the function 


= Ss 4~” cos(32" 12). 


n=1 
(a) Show that this series is uniformly convergent, and that f is continuous. 


(b) Show that for every integer j and every integer m > 1, we have 


gjgtl ] _ 
u (4) -1( Size 


(Hint: use the identity 


Yon 6 os) + Om + 3 re 


n=m+1 


for certain sequences a,. Also, use the fact that the cosine function 
is periodic with period 27, as well as the geometric series formula 
Sot” = te for any |r| < 1. Finally, you will need the inequal- 
ity | cos(x) — cos(y)| < |# — y| for any real numbers x and y; this can 
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be proven by using the mean value theorem (Corollary 10.2.9), or the 
fundamental theorem of calculus (Theorem 11.9.4).) 


(c) Using (b), show that for every real number xo, the function f is not 
differentiable at xo. (Hint: for every xo and every m > 1, there exists 
an integer j such that 7 < 32x  < j +1, thanks to Exercise 5.4.3.) 


(d) Explain briefly why the result in (c) does not contradict Corollary 3.7.3. 


Chapter 5 


Fourier series 


In the previous two chapters, we discussed the issue of how certain func- 
tions (for instance, compactly supported continuous functions) could be 
approximated by polynomials. Later, we showed how a different class 
of functions (real analytic functions) could be written exactly (not ap- 
proximately) as an infinite polynomial, or more precisely a power series. 

Power series are already immensely useful, especially when dealing 
with special functions such as the exponential and trigonometric func- 
tions discussed earlier. However, there are some circumstances where 
power series are not so useful, because one has to deal with functions 
(e.g., /x) which are not real analytic, and so do not have power series. 

Fortunately, there is another type of series expansion, known as 
Fourier series, which is also a very powerful tool in analysis (though 
used for slightly different purposes). Instead of analyzing compactly 
supported functions, it instead analyzes periodic functions; instead of 
decomposing into polynomials, it decomposes into trigonometric poly- 
nomials. Roughly speaking, the theory of Fourier series asserts that just 
about every periodic function can be decomposed as an (infinite) sum 
of sines and cosines. 


Remark 5.0.6. Jean-Baptiste Fourier (1768-1830) was, among other 
things, an administrator accompanying Napoleon on his invasion of 
Egypt, and then a Prefect in France during Napoleons reign. After the 
Napoleonic wars, he returned to mathematics. He introduced Fourier 
series in an important 1807 paper in which he used them to solve what 
is now known as the heat equation. At the time, the claim that every 
periodic function could be expressed as a sum of sines and cosines was 
extremely controversial, even such leading mathematicians as Euler de- 
clared that it was impossible. Nevertheless, Fourier managed to show 
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that this was indeed the case, although the proof was not completely 
rigorous and was not totally accepted for almost another hundred years. 


There will be some similarities between the theory of Fourier series 
and that of power series, but there are also some major differences. For 
instance, the convergence of Fourier series is usually not uniform (i.e., 
not in the L© metric), but instead we have convergence in a different 
metric, the L?-metric. Also, we will need to use complex numbers heavily 
in our theory, while they played only a tangential role in power series. 

The theory of Fourier series (and of related topics such as Fourier in- 
tegrals and the Laplace transform) is vast, and deserves an entire course 
in itself. It has many, many applications, most directly to differential 
equations, signal processing, electrical engineering, physics, and analy- 
sis, but also to algebra and number theory. We will only give the barest 
bones of the theory here, however, and almost no applications. 


5.1 Periodic functions 


The theory of Fourier series has to do with the analysis of periodic func- 
tions, which we now define. It turns out to be convenient to work with 
complex-valued functions rather than real-valued ones. 


Definition 5.1.1. Let L > 0 bea real number. A function f:R—>C 
is periodic with period L, or L-periodic, if we have f(a + L) = f(x) for 
every real number z. 


Example 5.1.2. The real-valued functions f(x) = sin(#) and f(x) = 
cos(x) are 27-periodic, as is the complex-valued function f(r) = e’. 
These functions are also 47-periodic, 67-periodic, etc. (why?). The 
function f(x) = x, however, is not periodic. The constant function 
f(x) =1 is L-periodic for every L. 


Remark 5.1.3. Ifa function f is L-periodic, then we have f(#+kL) = 
f(x) for every integer k (why? Use induction for the positive k, and 
then use a substitution to convert the positive k result to a negative k 
result. The k = 0 case is of course trivial). In particular, if a function 
f is 1-periodic, then we have f(a +k) = f(x) for every k € Z. Because 
of this, 1-periodic functions are sometimes also called Z-periodic (and 
L-periodic functions called LZ-periodic). 


Example 5.1.4. For any integer n, the functions cos(27nx), sin(27n2), 
and e?"’"” are all Z-periodic. (What happens when n is not an integer?). 
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Another example of a Z-periodic function is the function f : R > C 
defined by f(x) := 1 when x € [n,n+5) for some integer n, and f(x) := 0 
when x € [n+ 4,n+1) for some integer n. This function is an example 
of a square wave. 


Henceforth, for simplicity, we shall only deal with functions which are 
Z-periodic (for the Fourier theory of L-periodic functions, see Exercise 
5.5.6). Note that in order to completely specify a Z-periodic function 
f : R > C, one only needs to specify its values on the interval [0, 1), 
since this will determine the values of f everywhere else. This is because 
every real number «x can be written in the form « = k+ y where & is 
an integer (called the integer part of x, and sometimes denoted [2]) 
and y € [0,1) (this is called the fractional part of x, and sometimes 
denoted {}); see Exercise 5.1.1. Because of this, sometimes when we 
wish to describe a Z-periodic function f we just describe what it does 
on the interval [0,1), and then say that it is extended periodically to all 
of R. This means that we define f(x) for any real number zx by setting 
f(x) := f(y), where we have decomposed x = k + y as discussed above. 
(One can in fact replace the interval [0, 1) by any other half-open interval 
of length 1, but we will not do so here.) 

The space of complex-valued continuous Z-periodic functions is de- 
noted C(R/Z; C). (The notation R/Z comes from algebra, and denotes 
the quotient group of the additive group R by the additive group Z; 
more information in this can be found in any algebra text.) By “contin- 
uous” we mean continuous at all points on R; merely being continuous 
on an interval such as [0,1] will not suffice, as there may be a discon- 
tinuity between the left and right limits at 1 (or at any other integer). 
Thus for instance, the functions sin(27nx), cos(27nx), and e?""* are 
all elements of C(R/Z;C), as are the constant functions, however the 
square wave function described earlier is not in C(R/Z;C) because it is 
not continuous. Also the function sin(x) would also not qualify to be in 
C(R/Z;C) since it is not Z-periodic. 


Lemma 5.1.5 (Basic properties of C(R/Z;C)). 


(a) (Boundedness) If f € C(R/Z;C), then f is bounded (i.e., there 
exists a real number M > 0 such that |f(x)| < M for all x € R). 


(b) (Vector space and algebra properties) If f,g © C(R/Z; C), then the 
functions f+ 9, f—g, and fg are also in C(R/Z;C). Also, if c is 
any complex number, then the function cf is also in C(R/Z;C). 
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(c) (Closure under uniform limits) If (fn)P@1 1s a sequence of func- 
tions in C(R/Z; C) which converges uniformly to another function 
f:R—-C, then f is also in C(R/Z;C). 


Proof. See Exercise 5.1.2. 


One can make C(R/Z; C) into a metric space by re-introducing the 
now familiar sup-norm metric 
doo (fg) = sup | f(a) — g(a)| = sup |f(x) — g(a)| 
zeR x€[0,1) 
of uniform convergence. (Why is the first supremum the same as the 
second?) See Exercise 5.1.3. 


— Exercises — 
Exercise 5.1.1. Show that every real number x can be written in exactly one 
way in the form z = k+y, where k is an integer and y € [0,1). (Hint: to prove 
existence of such a representation, set k := sup{l © Z:1< z}.) 
Exercise 5.1.2. Prove Lemma 5.1.5. (Hint: for (a), first show that f is bounded 
on (0, 1].) 
Exercise 5.1.3. Show that C(R/Z; C) with the sup-norm metric d,, is a metric 
space. Furthermore, show that this metric space is complete. 


5.2 Inner products on periodic functions 


From Lemma 5.1.5 we know that we can add, subtract, multiply, and 
take limits of continuous periodic functions. We will need a couple more 
operations on the space C(R/Z;C), though. The first one is that of 
inner product. 


Definition 5.2.1 (Inner product). If f,g € C(R/Z;C), we define the 
inner product (f,g) to be the quantity 


(f,9) = f(x)g(x) da. 
[0,1] 
Remark 5.2.2. In order to integrate a complex-valued function, f(x) = 
g(x)+th(x), we use the definition that Sia 5] is Sie b| gti iP b] h; i.e., we 
integrate the real and imaginary parts of the function separately. Thus 
for instance Sn +ix) dx = Sig 1 dx+i fray a dz =1+ 3i. It is easy 
to verify that all the standard rules of calculus (integration by parts, 
fundamental theorem of calculus, substitution, etc.) still hold when the 
functions are complex-valued instead of real-valued. 
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Example 5.2.3. Let f be the constant function f(x) := 1, and let g(x) 
be the function g(2) := e?""*. Then we have 


(f,9) =) 1e2™2 dx 
[0,1] 


_— | eerie dx 
[0,1] 


eT 2nta 


—2771 
eo 2m _ 0 


—277 
1-1 
—271 

=.0; 


Remark 5.2.4. In general, the inner product (f,g) will be a complex 
number. (Note that f(x)g(x) will be Riemann integrable since both 
functions are bounded and continuous.) 


Roughly speaking, the inner product (f,g) is to the space C(R/Z; C) 
what the dot product x-y is to Euclidean spaces such as R”. We list 
some basic properties of the inner product below; a more in-depth study 
of inner products on vector spaces can be found in any linear algebra 
text but is beyond the scope of this text. 

Lemma 5.2.5. Let f,g,h © C(R/Z;C). 


(a) (Hermitian property) We have (g, f) = (f,g). 


(b) (Positivity) We have (f, f) > 0. Furthermore, we have (f, f) =0 
if and only if f =0 (i.e., f(x) =0 for all x € R). 


(c) (Linearity in the first variable) We have (f +g,h) = (f,h)+(g, h). 
For any complex number c, we have (cf,g) = c(f,g). 


(d) (Antilinearity in the second variable) We have (f,g+h) = (f,g) + 
(f,h). For any complex number c, we have (f,cg) = (f,g). 


Proof. See Exercise 5.2.1. 


From the positivity property, it makes sense to define the L? norm 
| fll of a function f € C(R/Z;C) by the formula 


1/2 1/2 
IIflle= VS) = ( f(a) f(@) ws) = @ If (@) i) 
[0,1] [0,1] 
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Thus || f||2 > 0 for all f. The norm ||f||2 is sometimes called the root 
mean square of f. 


Example 5.2.6. If f(x) is the function e?""”, then 


1/2 1/2 
Ifla=(f ered) =(f rae) ==. 
[0,1] (0,1] 


This L? norm is related to, but is distinct from, the L° norm 
Il flloo := SUPzeR |f(x)|. For instance, if f(x) = sin(x), then ||fllo = 1 
but || fll2 = Woe In general, the best one can say is that 0 < ||fllo < 
Il flloo; see Exercise 5.2.3. 

Some basic properties of the L? norm are given below. 


Lemma 5.2.7. Let f,g © C(R/Z;C). 
(a) (Non-degeneracy) We have ||f||2 = 0 if and only if f = 0. 


b) (Cauchy-Schwarz inequality) We have |(f,g)| < ||fllellglle- 


) 

(b) 

(c) (Triangle inequality) We have ||f + gllo < ||fll2 + Ilglla- 

(d) (Pythagoras’ theorem) If (f,g) = 0, then ||f + gll3 = \lFll3 + llgll3- 
(e) (Homogeneity) We have ||cf|l2 = |el||fll2 for all cE C. 


Proof. See Exercise 5.2.4. 


In light of Pythagoras’ theorem, we sometimes say that f and g are 
orthogonal iff (f,g) = 0. 
We can now define the L? metric dz2 on C(R/Z; C) by defining 


1/2 
dr2(f,9) = |lf — glle= (/ , f(x) — g(«))? is) 
Remark 5.2.8. One can verify that dy2 is indeed a metric (Exercise 
5.2.2). Indeed, the L? metric is very similar to the 1? metric on Euclidean 
spaces R”, which is why the notation is deliberately chosen to be similar; 
you should compare the two metrics yourself to see the analogy. 


Note that a sequence f,, of functions in C(R/Z;C) will converge in 
the L? metric to f € C(R/Z;C) if dr2(fn, f) + 0 as n > 00, or in 
other words that 
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Remark 5.2.9. The notion of convergence in L? metric is different from 
that of uniform or pointwise convergence; see Exercise 5.2.6. 


Remark 5.2.10. The L? metric is not as well-behaved as the L® metric. 
For instance, it turns out the space C(R/Z;C) is not complete in the 
L? metric, despite being complete in the L® metric; see Exercise 5.2.5. 


— Exercises — 


Exercise 5.2.1. Prove Lemma 5.2.5. (Hint: the last part of (b) is a little tricky. 
You may need to prove by contradiction, assuming that f is not the zero 
function, and then show that Jio.4 | f(a)|? is strictly positive. You will need to 
use the fact that f, and hence |f|, is continuous, to do this.) 


Exercise 5.2.2. Prove that the L? metric dz2 on C(R/Z;C) does indeed turn 
C(R/Z;C) into a metric space. (cf. Exercise 1.1.6). 


Exercise 5.2.3. If f € C(R/Z; C) is a non-zero function, show that 0 < ||fll2 < 
| fllne. Conversely, if 0 < A < B are real numbers, so that there exists a 
non-zero function f € C(R/Z;C) such that || f||2 = A and ||f||. = B. (Hint: 
let g be a non-constant non-negative real-valued function in C(R/Z;C), and 
consider functions f of the form f = (c+dg)'/? for some constant real numbers 
c,d > 0.) 


Exercise 5.2.4. Prove Lemma 5.2.7. (Hint: use Lemma 5.2.5 frequently. For 
the Cauchy-Schwarz inequality, begin with the positivity property (f, f) > 0, 
but with f replaced by the function f\|g||3 — (f,g)g, and then simplify using 
Lemma 5.2.5. You may have to treat the case ||g||z2 = 0 separately. Use the 
Cauchy-Schwarz inequality to prove the triangle inequality.) 


Exercise 5.2.5. Find a sequence of continuous periodic functions which converge 
in L? to a discontinuous periodic function. (Hint: try converging to the square 
wave function.) 


Exercise 5.2.6. Let f € C(R/Z, C), and let (f,)°21 be a sequence of functions 
in C(R/Z;C). 


(a) Show that if f, converges uniformly to f, then f, also converges to f in 
the L? metric. 


(b) Give an example where f, converges to f in the L? metric, but does not 
converge to f uniformly. (Hint: take f = 0. Try to make the functions 
fn large in sup norm.) 


(c) Give an example where f, converges to f in the L? metric, but does not 
converge to f pointwise. (Hint: take f = 0. Try to make the functions 
fn large at one point.) 


(d) Give an example where f,, converges to f pointwise, but does not con- 
verge to f inthe L? metric. (Hint: take f = 0. Try to make the functions 
fn large in L? norm.) 
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5.3 Trigonometric polynomials 


We now define the concept of a trigonometric polynomial. Just as poly- 
nomials are combinations of the functions x” (sometimes called monomi- 
als), trigonometric polynomials are combinations of the functions e?"'"* 
(sometimes called characters). 


Definition 5.3.1 (Characters). For every integer n, we let e, € 
C(R/Z;C) denote the function 


Cyt) = gore. 


This is sometimes referred to as the character with frequency n. 


Definition 5.3.2 (Trigonometric polynomials). A function f in 

C(R/Z;C) is said to be a trigonometric polynomial if we can write 

5 ee pam Nn C€nén for some integer N > 0 and some complex numbers 
N 


(4 ears 


Example 5.3.3. The function f = 4e_29 + ie_1 — 2e9 + Ne, — 3e€2 isa 
trigonometric polynomial; it can be written more explicitly as 


f(x) = 4e 4m 43 je 27 9 )= ZeAmia 


Example 5.3.4. For any integer n, the function cos(27nz) is a trigono- 
metric polynomial, since 


2rinx —2ring 
e€ +e 1 1 
cos(2mnx) = 5 = 5e-n + 5 en: 
Similarly the function sin(27nx) = seen + Hen is a trigonometric 


polynomial. In fact, any linear combination of sines and cosines is also 
a trigonometric polynomial, for instance 3 + icos(27x) + 4isin(472) is 
a trigonometric polynomial. 


The Fourier theorem will allow us to write any function in C(R/Z; C) 
as a Fourier series, which is to trigonometric polynomials what power se- 
ries is to polynomials. To do this we will use the inner product structure 
from the previous section. The key computation is 


Lemma 5.3.5 (Characters are an orthonormal system). For any inte- 
gers n andm, we have (en, @m) = 1 when n =m and (en, em) = 0 when 
n#m. Also, we have |le,|| = 1. 
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Proof. See Exercise 5.3.2. 


As a consequence, we have a formula for the co-efficients of a trigono- 
metric polynomial. 


Corollary 5.3.6. Let f = ee Cnen be a trigonometric polynomial. 
Then we have the formula 


Cn = CEsGn) 


for all integers -N <n < N. Also, we have 0 = (f,en) whenever 
n>N orn<-—N. Also, we have the identity 


N 
WF = So lenl?. 
n=—N 


Proof. See Exercise 5.3.3. 
We rewrite the conclusion of this corollary in a different way. 


Definition 5.3.7 (Fourier transform). For any function f € 
C(R/Z;R), and any integer n € Z, we define the n'* Fourier coeffi- 
cient of f, denoted f(n), by the formula 


Fae ae Wen (x)e— 27" der. 
[0,1] 


The function f : Z— C is called the Fourier transform of f. 


From Corollary 5.3.6, we see that whenever f = yee Nn Cn€n is a 
trigonometric polynomial, we have 


N ioe) 


f= oe ea) en:= S> (Fs Capen 


n=—N n=—0o 
and in particular we have the Fourier inversion formula 
[oe) 
f= S F(n)en 
n=—0o 


or in other words 


f(a) = S> flnjerrine. 


nN=—CO 
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The right-hand side is referred to as the Fourier series of f. Also, from 
the second identity of Corollary 5.3.6 we have the Plancherel formula 


(oe) 


IFlg= So f(r) 


n=— CO 


2 


Remark 5.3.8. We stress that at present we have only proven the 
Fourier inversion and Plancherel formulae in the case when f is a 
trigonometric polynomial. Note that in this case that the Fourier coef- 
ficients f (n) are mostly zero (indeed, they can only be non-zero when 
—N <n <_N), and so this infinite sum is really just a finite sum in 
disguise. In particular there are no issues about what sense the above 
series converge in; they both converge pointwise, uniformly, and in L? 
metric, since they are just finite sums. 


In the next few sections we will extend the Fourier inversion and 
Plancherel formulae to general functions in C(R/Z; C), not just trigono- 
metric polynomials. (It is also possible to extend the formula to discon- 
tinuous functions such as the square wave, but we will not do so here). To 
do this we will need a version of the Weierstrass approximation theorem, 
this time requiring that a continuous periodic function be approximated 
uniformly by trigonometric polynomials. Just as convolutions were used 
in the proof of the polynomial Weierstrass approximation theorem, we 
will also need a notion of convolution tailored for periodic functions. 


— Exercises — 


Exercise 5.3.1. Show that the sum or product of any two trigonometric poly- 
nomials is again a trigonometric polynomial. 


Exercise 5.3.2. Prove Lemma 5.3.5. 


Exercise 5.3.3. Prove Corollary 5.3.6. (Hint: use Lemma 5.3.5. For the 
second identity, either use Pythagoras’ theorem and induction, or substitute 
f= eae n Cn€n and expand everything out.) 


5.4 Periodic convolutions 


The goal of this section is to prove the Weierstrass approximation the- 
orem for trigonometric polynomials: 


Theorem 5.4.1. Let f € C(R/Z;C), and let e > 0. Then there exists 
a trignometric polynomial P such that || f — Pllo <«. 
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This theorem asserts that any continuous periodic function can be 
uniformly approximated by trigonometric polynomials. To put it an- 
other way, if we let P(R/Z;C) denote the space of all trigonomet- 
ric polynomials, then the closure of P(R/Z;C) in the L® metric is 
C(R/Z;C). 

It is possible to prove this theorem directly from the Weierstrass 
approximation theorem for polynomials (Theorem 3.8.3), and both the- 
orems are a special case of a much more general theorem known as the 
Stone- Weierstrass theorem, which we will not discuss here. However we 
shall instead prove this theorem from scratch, in order to introduce a 
couple of interesting notions, notably that of periodic convolution. The 
proof here, though, should strongly remind you of the arguments used 
to prove Theorem 3.8.3. 


Definition 5.4.2 (Periodic convolution). Let f,g € C(R/Z;C). Then 
we define the periodic convolution f *g :R — C of f and g by the 
formula 
f *g(@) = f(y)g(a — y) dy. 
[0,1] 

Remark 5.4.3. Note that this formula is slightly different from the con- 
volution for compactly supported functions defined in Definition 3.8.9, 
because we are only integrating over [0,1] and not on all of R. Thus, 
in principle we have given the symbol f * g two conflicting meanings. 
However, in practice there will be no confusion, because it is not possi- 
ble for a non-zero function to both be periodic and compactly supported 
(Exercise 5.4.1). 


Lemma 5.4.4 (Basic properties of periodic convolution). Let f,g,h € 
C(R/Z;C). 


(a) (Closure) The convolution f * g is continuous and Z-periodic. In 
other words, f *g € C(R/Z;C). 


(b) (Commutativity) We have f xg = g* f. 


(c) (Bilinearity) We have f *(g+h)=fxg+fx*h and(f+g)*h= 
fxh+gx*h. For any complex number c, we have c(f * g) = 


(cf) *9 = f * (cg). 


Proof. See Exercise 5.4.2. 
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Now we observe an interesting identity: for any f € C(R/Z;C) and 
any integer n, we have 


f*en =f (nen. 


To prove this, we compute 


oe I Senne dy 


= e2tina flyer dy = f(nje2rin = f(nyen 


[0,1] 
as desired. 
More generally, we see nom Lemma 5.4.4(iii) that for any trigono- 
metric polynomial P = S77" y, Cnen, we have 
n=N 
fxP= S° Gf een) = ss f(n 1) Cn ey: 
n=—N 


Thus the periodic convolution of any function in C(R/Z;C) with a 
trigonometric polynomial, is again a trigonometric polynomial. (Com- 
pare with Lemma 3.8.13.) 

Next, we introduce the periodic analogue of an approximation to the 
identity. 


Definition 5.4.5 (Periodic approximation to the identity). Let « > 0 
and 0 < 6 < 1/2. A function f € C(R/Z;C) is said to be a periodic 
(€,0) approximation to the identity if the following properties are true: 


(a) f(x) > 0 for alla ER, and fia f = 
(b) We have f(x) <« for all 6 < |a| < 1-6. 


Now we have an analogue of Lemma 3.8.8: 


Lemma 5.4.6. For everye > 0 and0 < 6 < 1/2, there exists a trigono- 
metric polynomial P which is an (€,6) approximation to the identity. 


Proof. We sketch the proof of this lemma here, and leave the completion 
of it to Exercise 5.4.3. Let N > 1 be an integer. We define the Fejér 
kernel Fy to be the function 


r= Ss (1-H on 


n=—N 
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Clearly F'y is a trigonometric polynomial. We observe the identity 


N-1 |? 


doen 
n=0 


(why?). But from the geometric series formula (Lemma 7.3.3) we have 


1 


eN—€0 _ eT(N-1) sin(a Nar) 
€y— ey sin(72) 


eg e) = 


when x is not an integer, (why?) and hence we have the formula 


sin(7 Na)? 
EN) aes 


When x is an integer, the geometric series formula does not apply, but 
one has F'y(a) = N in that case, as one can see by direct computation. 
In either case we see that Fy (x) > 0 for any x. Also, we have 


~ In| \0| 
Fy(x) dx = (- 2) f on = (1-H )i1=1 
is a N} Joos] N 


(why?). Finally, since sin(7Nx) < 1, we have 


1 
F SS < 
n(z) $ N sin(ra)? — N sin(76)? 


whenever 6 < |a| < 1—6 (this is because sin is increasing on [0, 7/2 
and decreasing on [7 /2,7]). Thus by choosing N large enough, we can 
make F'y(x) < e¢ for all 6 < |a| << 1-0. 


Proof of Theorem 5.4.1. Let f be any element of C(R/Z;C); we know 
that f is bounded, so that we have some M > 0 such that |f(x)| < 
for alla € R. 

Let € > 0 be arbitrary. Since f is uniformly continuous, there exists a 
6 > 0 such that | f(x) — f(y)| < ¢ whenever |z—y| < 6. Now use Lemma 
5.4.6 to find a trigonometric polynomial P which is a (¢,6) approxima- 
tion to the identity. Then f * P is also a trigonometric polynomial. We 
now estimate || f — f * Plloo. 
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Let x be any real number. We have 


|f(x) — f * P(x)| =|f(z) — P * f(2)| 
= |f(x) - (x — y)P(y) dy 
[0,1] 
= f(x)P(y) dy — (x — y)P(y) dy 
[0,1] 0,1) 
=|f- (fle) - fey) PW) dy 
[0,1] 


| FO=fE= HPO ae | f(x) — f(a — y)|P(y) dy 
(0,5] [5,1—6] 


+f if@)-se-vIPW ay 
[16,1] 
which we can bound from above by 
< | eP(y) ay + f 2Me dy 
[0,6] [6,1—4] 


+f \f@=1)= Fe —WlPw) dy 
[1—6,1] 


< | eP(y) ay + f 2Me ay + f eP(y) dy 
[0,6] [6,1—4] [1—6,1] 


<e+2Me+t+e 

= (2M + 2)e. 
Thus we have ||f — f * Pll, < (2M +4 2)e. Since M is fixed and ¢ is 
arbitrary, we can thus make f * P arbitrarily close to f in sup norm, 
which proves the periodic Weierstrass approximation theorem. 


— Exercises — 


Exercise 5.4.1. Show that if f : R — C is both compactly supported and 
Z-periodic, then it is identically zero. 
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Exercise 5.4.2. Prove Lemma 5.4.4. (Hint: to prove that f * g is continuous, 
you will have to do something like use the fact that f is bounded, and g is 
uniformly continuous, or vice versa. To prove that f *«g = g* f, you will need 
to use the periodicity to “cut and paste” the interval [0, 1].) 


Exercise 5.4.3. Fill in the gaps marked (why?) in Lemma 5.4.6. (Hint: for the 
first identity, use the identities |z|? = 2Z, & = e_n, and e€nem = €ntm-) 


5.5 The Fourier and Plancherel theorems 


Using the Weierstrass approximation theorem (Theorem 5.4.1), we can 
now generalize the Fourier and Plancherel identities to arbitrary contin- 
uous periodic functions. 


Theorem 5.5.1 (Fourier theorem). For any f € C(R/Z;C), the series 


co Of (nyen converges in L? metric to f. In other words, we have 
n=— Co 


= 0. 
2 


lim 
Noo 


hd A 
f- >> f@en 
n=—N 


Proof. Let ¢ > 0. We have to show that there exists an No such that 
f- oe f(n)en|l2 <e for all sufficiently large N. 

By the Weierstrass approximation theorem (Theorem 5.4.1), we 
can find a trigonometric polynomial P = ee No Cnen such that 
f —Pllo <<, for some No > 0. In particular we have || f — Pllz < «. 

Now let N > No, and let Fy := eae f(n)en. We claim that 
f — Fn|le < ¢. First observe that for any |m| < N, we have 


N 
(f — Fu, em) = Cea = S- f(n) (en, em) = f(m) — f(m) = 0, 
n=—N 


where we have used Lemma 5.3.5. In particular we have 
(f — Fu, Fn — P) =0 


since we can write Fy — P as a linear combination of the e,, for which 
|m| < N. By Pythagoras’ theorem we therefore have 


lf — Pll = If — Fwllo + |LFw — Pll 
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and in particular 
lf = Frles [FFs 


as desired. 


Remark 5.5.2. Note that we have only obtained convergence of the 
Fourier series }7°° f (n)en to f in the L? metric. One may ask 
whether one has convergence in the uniform or pointwise sense as well, 
but it turns out (perhaps somewhat surprisingly) that the answer is no 
to both of those questions. However, if one assumes that the function f 
is not only continuous, but is also continuously differentiable, then one 
can recover pointwise convergence; if one assumes continuously twice 
differentiable, then one gets uniform convergence as well. These results 
are beyond the scope of this text and will not be proven here. How- 
ever, we will prove one theorem about when one can improve the L? 
convergence to uniform convergence: 

Theorem 5.5.3. Let f € C(R/Z;C), and suppose that the series 


yoo |F(n)| is absolutely convergent. Then the series \--_., f(n)en 
converges uniformly to f. In other words, we have 


lim = 0. 
N->oo 


iy A 
f- >> fen 
n=—N 


Proof. By the Weierstrass M-test (Theorem 3.5.7), we see that 


yo f (nen converges to some function F’, which by Lemma 5.1.5(iii) 
is also continuous and Z-periodic. (Strictly speaking, the Weierstrass 
test was phrased for series from n = 1 to n = o, but also works for se- 
ries from n = —co to n = +o; this can be seen by splitting the doubly 


infinite series into two pieces.) Thus 


(oe) 


N 
ace pe io Fo 0 
which implies that 
N A 
es Xe five. i 0 


since the L? norm is always less than or equal to the L® norm. But the 
sequence yo v f(n)en is already converging in L? metric to f by the 
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Fourier theorem, so can only converge in L? metric to F if F = f (cf. 
Proposition 1.1.20). Thus F = f, and so we have 


=0 


(oe) 


N 


as desired. 


As a corollary of the Fourier theorem, we obtain 


Theorem 5.5.4 (Plancherel theorem). For any f € C(R/Z;C), the 
series o°°_.. | f(n)|? is absolutely convergent, and 


Ifl2= So if@m)P. 


This theorem is also known as Parseval’s theorem. 


Proof. Let ¢ > 0. By the Fourier theorem we know that 
N a 
|) 9° Foo 
n=—N 


if N is large enough (depending on ¢). In particular, by the triangle 
inequality this implies that 


<€é 


2 


IIflle-es< < |Iflle +e. 


as A 
> f (nen 
n=—N 2 


On the other hand, by Corollary 5 we have 


N N 1/2 
5 fides = ( 3 ito 
n=—N n=—N 


2 


and hence 
N 


(Ifll2-2)? < So lf @)P < (fle +2)”. 


n=—N 
Taking lim sup, we obtain 


N 
(IIfllz-¢)? < Pee UR S> lf)? < (fle +2)? 
00, NN 


124 5. Fourier series 


Since ¢ is arbitrary, we thus obtain by the squeeze test that 


lim sup oe lf(m)? = F113 


Noo 


and the claim follows. 


There are many other properties of the Fourier transform, but we 
will not develop them here. In the exercises you will see a small number 
of applications of the Fourier and Plancherel theorems. 


— Exercises — 
Exercise 5.5.1. Let f be a function in C(R/Z; C), and define the trigonometric 
Fourier coefficients an, by, for n = 0,1,2,3,... by 
An = 2 f(x) cos(2mnx) dx; by := 2 f(x) sin(2anz) da. 


[0,1] [0,1] 


(a) Show that the series 


1 co 
Zips Yi (a cos(27nx) + by sin(27nx)) 
2 n=1 


converges in L? metric to f. (Hint: use the Fourier theorem, and break 
up the exponentials into sines and cosines. Combine the positive n terms 
with the negative n terms.) 


(b) Show that if °°, a, and S>*°_, bp are absolutely convergent, then the 
above series actually converges uniformly to f, and not just in L? metric. 
(Hint: use Theorem 5.5.3). 


Exercise 5.5.2. Let f(x) be the function defined by f(a) = (1 — 2x)? when 
x € [0,1), and extended to be Z-periodic for the rest of the real line. 


(a) Using Exercise 5.5.1, show that the series 


Co 


1 4 
3 + S- 22 cos(27nx) 


n=1 
converges uniformly to f. 
(b) Conclude that 7°°_, 4 = = (Hint: evaluate the above series at x = 0.) 


(c) Conclude that 77°, 4 = a (Hint: expand the cosines in terms of 


exponentials, and use Plancherel’s theorem.) 
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Exercise 5.5.3. If f € C(R/Z;C) and P is a trigonometric polynomial, show 
that 


f* P(n) = f(n)en = f(n)P(n) 
for all integers n. More generally, if f,g € C(R/Z; C), show that 


f *g(n) = f(n)g(n) 
for all integers n. (A fancy way of saying this is that the Fourier transform 
intertwines convolution and multiplication). 


Exercise 5.5.4. Let f € C(R/Z;C) be a function which is differentiable, and 
whose derivative f’ is also continuous. Show that f’ also lies in C(R/Z;C), 
and that f’(n) = 2minf(n) for all integers n. 


Exercise 5.5.5. Let f,g € C(R/Z;C). Prove the Parseval identity 


wf rea g(x) dx = S~ f(n) 


neZ 


(Hint: apply the Plancherel theorem to f +g and f — g, and subtract the two.) 
Then conclude that the real parts can be removed, thus 


f« Ha)al@) dx = ST f(nyatn). 


(Hint: apply the first identity with f replaced by if.) 


Exercise 5.5.6. In this exercise we shall develop the theory of Fourier series for 
functions of any fixed period L. 

Let L > 0, and let f : R — C be a complex-valued function which is 
continuous and L-periodic. Define the numbers c,, for every integer n by 


Cni=— fa\e Orr dx. 
[0,1] 


(a) Show that the series 


Co 


. Cn e2rina/L 


n=—Co 


converges in L? metric to f. More precisely, show that 


N 
: _ Qrinw/L\2 = 
Jim | f(x) y Cn€ |" dx =0. 
[0,L] ALON 


(Hint: apply the Fourier theorem to the function f(Lz).) 
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(b) If the series $>°° \cn| is absolutely convergent, show that 


n=— Co 


Co 


S- gue erele 
converges uniformly to f. 
(c) Show that 
1 | 2 — 2 
7 |f(@)|° da = |en|". 
ae bs 


n=—Cco 


(Hint: apply the Plancherel theorem to the function f(L2).) 


Chapter 6 


Several variable differential calculus 


6.1 Linear transformations 


We shall now switch to a different topic, namely that of differentiation 
in several variable calculus. More precisely, we shall be dealing with 
maps f : R” > R” from one Euclidean space to another, and trying to 
understand what the derivative of such a map is. 

Before we do so, however, we need to recall some notions from linear 
algebra, most importantly that of a linear transformation and a matrix. 
We shall be rather brief here; a more thorough treatment of this material 
can be found in any linear algebra text. 


Definition 6.1.1 (Row vectors). Let n > 1 be an integer. We refer to 
elements of R” as n-dimensional row vectors. A typical n-dimensional 
row vector may take the form x = (x1, %2,...,2%n), which we abbreviate 
as (@i)1<i<n; the quantities 71, 7%2,...,%,, are of course real numbers. If 
(i)i<i<n and (y;)1<i<n are n-dimensional row vectors, we can define 
their vector sum by 


(a izaen bP Oii<ien = (Gi beiiens 


and also if c € R is any scalar, we can define the scalar product 
c(i )i<i<n by 
c(@i)i<i<n = (€@i)1<i<n- 


Of course one has similar operations on R™ as well. However, ifn 4m, 
then we do not define any operation of vector addition between vectors 
in R” and vectors in R™ (e.g., (2,3,4) + (5,6) is undefined). We also 
refer to the vector (0,...,0) in R” as the zero vector and also denote it 
by 0. (Strictly speaking, we should denote the zero vector of R” by Or», 
as they are technically distinct from each other and from the number 
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zero, but we shall not take care to make this distinction). We abbreviate 
(—1)a as —2. 


The operations of vector addition and scalar multiplication obey a 
number of basic properties: 


Lemma 6.1.2 (R” is a vector space). Let x,y,z be vectors in R”, 
and let c,d be real numbers. Then we have the commutativity property 
xcty=y4+4a, the additive associativity property (x + y)+2=a+4+ 
(y+ z), the additive identity property x+0=0+2 = 42, the additive 
inverse property x+(—x) = (—x)+a = 0, the multiplicative associativity 
property (cd)x = c(dx), the distributivity properties c(a + y) = cx + cy 
and (c+ d)x = cx + dz, and the multiplicative identity property la = x. 


Proof. See Exercise 6.1.1. 


Definition 6.1.3 (Transpose). If (aj )i<i<n = (1, %2,-.-,%n) is an n- 
dimensional row vector, we can define its transpose (2i)leicn by 


XY 

x2 
(zi)l<icn = eae TQ, +06, le = : 

In 


We refer to objects such as (2;){2;<,, a8 n-dimensional column vectors. 


Remark 6.1.4. There is no functional difference between a row vector 
and a column vector (e.g., one can add and scalar multiply column 
vectors just as well as we can row vectors), however we shall (rather 
annoyingly) need to transpose our row vectors into column vectors in 
order to be consistent with the conventions of matrix multiplication, 
which we will see later. Note that we view row vectors and column 
vectors as residing in different spaces; thus for instance we will not define 
the sum of a row vector with a column vector, even when they have the 
same number of elements. 


Definition 6.1.5 (Standard basis row vectors). We identify n special 
vectors in R”, the standard basis row vectors €1,...,@n. For each 1 < 
J <n, e; is the vector which has 0 in all entries except for the j* entry, 
which is equal to 1. 
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For instance, in R?, we have e, = (1,0,0), e2 = (0,1,0), and e3 = 
(0,0,1). Note that if « = (2j)1<j<n is a vector in R”, then 


n 
©=1™E, + V2€94+...+TMn€n = ) Ljej, 
j=l 


or in other words every vector in R” is a linear combination of the stan- 
dard basis vectors €1,...,€n. (The notation yt xje; is unambiguous 
because the operation of vector addition is both commutative and asso- 
ciative). Of course, just as every row vector is a linear combination of 
standard basis row vectors, every column vector is a linear combination 
of standard basis column vectors: 


n 
T T T T T 
uo = %41e; + XQ€) +... + Mne, = y Lie; - 
j=l 


There are (many) other ways to create a basis for R”, but this is a 
topic for a linear algebra text and will not be discussed here. 


Definition 6.1.6 (Linear transformations). A linear transformation T : 
R” > R” is any function from one Euclidean space R” to another R™ 
which obeys the following two axioms: 


(a) (Additivity) For every z,2’ € R”, we have T(x + 2’) =Tx4+Tv’. 


(b) (Homogeneity) For every x € R” and every c € R, we have 
De): = ela: 


Example 6.1.7. The dilation operator T, : R® — R® defined by 
T\x := 5x (i.e., it dilates each vector x by a factor of 5) is a linear trans- 
formation, since 5(x#+ 2’) = 5a +52’ for all x, a’ € R° and 5(cx) = c(52) 
for allz € R° andce R. 


Example 6.1.8. The rotation operator Tz : R? — R? defined by a 
clockwise rotation by 7/2 radians around the origin (so that T2(1,0) = 
(0,1), Z2(0,1) = (1,0), etc.) is a linear transformation; this can best 
be seen geometrically rather than analytically. 


Example 6.1.9. The projection operator T; : R? — R? defined by 
T3(x,y,z) := (x,y) is a linear transformation (why?). The inclusion 
operator T, : R? — R® defined by T(x, y) := (x,y,0) is also a linear 
transformation (why?). Finally, the identity operator I, :R"” > R”, 
defined for any n by I, := z is also a linear transformation (why?). 


130 6. Several variable differential calculus 
As we shall shortly see, there is a connection between linear trans- 
formations and matrices. 


Definition 6.1.10 (Matrices). An m x n matrix is an object A of the 
form 


a11 ai2 Qin 

a21 a22 aan 
A= : ’ 

QAQm1l Am2 +--+» Amn 


we shall abbreviate this as 
A = (aij) 1<i<ms1<j<n- 


In particular, n-dimensional row vectors are 1 x n matrices, while n- 
dimensional column vectors are n X 1 matrices. 


Definition 6.1.11 (Matrix product). Given an m x n matrix A and an 
n X p matrix B, we can define the matrix product AB to be the m x p 
matrix defined as 


n 


(ij i<i<mi<j<n(djk)1<j<nii<k<p = x Qi dj 
j=l 


l<ixm;1<k<p 


In particular, if 7 = (a;)t2 j<n 18 an n-dimensional column vector, and 
A = (aij) 1<i<m:<j<n is an m xn matrix, then Az” is an m-dimensional 


column vector: if 


n 
Ax? = ) Aj jX 5 
j=1 


1<i<m 


We now relate matrices to linear transformations. If A is an m x n 
matrix, we can define the transformation D4 :R” > R”™ by the formula 


(Laz) = Aa’. 


Example 6.1.12. If A is the matrix 


1523 
A=(4 5 ae 
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and x = (21,22,23) is a 3-dimensional row vector, then Lx is the 
2-dimensional row vector defined by 
oa 
1 2 3 @1+2%2+ 32 
Pins _ 1 2 3 
eae) = ( 4 5 6 ) oe ie ( dey + 522 + 623 ) 


X3 
or in other words 
Da(x1, £2,203) = (41 + 2xq + 3x3, 401 + 5x2 + 623). 


More generally, if 


Qi1 ai2 Qin 

a21 a22 aan 
A= 

AGm1l Am2 --- Amn 


then we have 


Lalaj)i<jen = | D> ajay 

j-t 1<i<m 
For any m x n matrix A, the transformation [4 is automatically linear; 
one can easily verify that Da(a+y) = Laxt+ Lay and La(cr) = c(Laz) 
for any n-dimensional row vectors x,y and any scalar c. (Why?) 


Perhaps surprisingly, the converse is also true, i.e., every linear trans- 
formation from R” to R™ is given by a matrix: 


Lemma 6.1.13. Let T : R” > R”™ be a linear transformation. Then 
there exists exactly one m x n matrix A such that T = La. 


Proof. Suppose T : R” — R”™ is a linear transformation. Let 
€1,€2,--.,€n be the standard basis row vectors of R”. Then 
Te,,Te2,...,Te, are vectors in R’™. For each 1 < 7 < n, we write 
Te; in co-ordinates as 


Te; — (a1j, @2;,. _ i teng) = La iziens 


i.e., we define aj; to be the it” component of Te;. Then for any n- 
dimensional row vector x = (%1,...,2%n), we have 
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which (since T is linear) is equal to 


I 
Ms 
iS) 
s 
8 
—_ 
” 
a 
: 
A 
3 


II 
& 
SI 

8 
S 


1<i<m 
But if we let A be the matrix 
ait a2 Gin 
a21 22 Gn 
A= 
AGm1l Am2 --- Amn 


then the previous vector is precisely L4x. Thus Tx = Lyx for all n- 
dimensional vectors x, and thus T = Ly. 

Now we show that A is unique, i.e., there does not exist any other 
matrix 


bit yp bin 

boi bap bon 
B= . 

bmi bm2 se bmn 


for which JT’ is equal to Lg. Suppose for sake of contradiction that we 
could find such a matrix B which was different from A. Then we would 
have L4 = Lg. In particular, we have Lye; = Lge; for every 1 < j <n. 
But from the definition of L,4 we see that 


Dae; = (aij )1<i<m 


and 
Lge; = (bij )i<i<m 
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and thus we have a;; = bj; for every 1 <i <mand1<j<m, thus A 
and B are equal, a contradiction. 


Remark 6.1.14. Lemma 6.1.13 establishes a one-to-one correspondence 
between linear transformations and matrices, and is one of the funda- 
mental reasons why matrices are so important in linear algebra. One 
may ask then why we bother dealing with linear transformations at all, 
and why we don’t just work with matrices all the time. The reason 
is that sometimes one does not want to work with the standard basis 
€1,.-.,€n, but instead wants to use some other basis. In that case, the 
correspondence between linear transformations and matrices changes, 
and so it is still important to keep the notions of linear transformation 
and matrix distinct. More discussion on this somewhat subtle issue can 
be found in any linear algebra text. 


Remark 6.1.15. If T = Ly, then A is sometimes called the matrix 
representation of T’, and is sometimes denoted A = [T]. We shall avoid 
this notation here, however. 


The composition of two linear transformations is again a linear trans- 
formation (Exercise 6.1.2). The next lemma shows that the operation 
of composing linear transformations is connected to that of matrix mul- 
tiplication. 


Lemma 6.1.16. Let A be anmxn matriz, and let B be annxp matriz. 
Then LaLp = Lap. 


Proof. See Exercise 6.1.3. 


— Exercises — 
Exercise 6.1.1. Prove Lemma 6.1.2. 


Exercise 6.1.2. If T : R” > R’” is a linear transformation, and S : R? — R” 
is a linear transformation, show that the composition T'S : R? > R”™ of the 
two transforms, defined by T'S(x) := T(S(x)), is also a linear transformation. 
(Hint: expand T'S(x + y) and TS(cx) carefully, using plenty of parentheses.) 


Exercise 6.1.3. Prove Lemma 6.1.16. 


Exercise 6.1.4. Let T: R” — R”™ be a linear transformation. Show that there 
exists a number M > 0 such that |/Tx|| < M||z|| for all c ¢ R”. (Hint: use 
Lemma 6.1.13 to write T in terms of a matrix A, and then set M to be the sum 
of the absolute values of all the entries in A. Use the triangle inequality often - 
it’s easier than messing around with square roots etc.) Conclude in particular 
that every linear transformation from R” to R”™ is continuous. 
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6.2 Derivatives in several variable calculus 


Now that we’ve reviewed some linear algebra, we turn now to our main 
topic of this chapter, which is that of understanding differentiation of 
functions of the form f : R” > R”, ie., functions from one Euclidean 
space to another. For instance, one might want to differentiate the 
function f : R® > R* defined by 


Fes Y; z) = (xy, YZ, Le, yz). 


In single variable calculus, when one wants to differentiate a function 
f :E&—- R at a point xo, where F is a subset of R that contains xo, 
this is given by 


en f(x) = f(%0) 
f (0) _ ee L— XO , 


One could try to mimic this definition in the several variable case f : 
E + R"”, where F is now a subset of R”, however we encounter a 
difficulty in this case: the quantity f(x) — f(ao) will live in R™, and 
x — xo lives in R”, and we do not know how to divide an m-dimensional 
vector by an n-dimensional vector. 

To get around this problem, we first rewrite the concept of derivative 
(in one dimension) in a way which does not involve division of vectors. 
Instead, we view differentiability at a point xg as an assertion that a 
function f is “approximately linear” near x. 


Lemma 6.2.1. Let E be a subset of R, f : E > R be a function, 
zo € E, and LER. Then the following two statements are equivalent. 


(a) f is differentiable at xo, and f'(x0) = L. 
[f(e)—(F (#0) +L(e@—20))| _ 


|z—x0| 


(b) We have limy_,29:2¢B—{29} 


Proof. See Exercise 6.2.1. 


In light of the above lemma, we see that the derivative f(x) can 
be interpreted as the number L for which | f(x) — (f(xo) + L(x — x0))| 
is small, in the sense that it tends to zero as x tends to xo, even if 
we divide out by the very small number |x — xo|. More informally, 
the derivative is the quantity L such that we have the approximation 


f(x) — f(xo) © L(x — x0). 
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This does not seem too different from the usual notion of differen- 
tiation, but the point is that we are no longer explicitly dividing by 
x— xo. (We are still dividing by |x — xo|, but this will turn out to be 
OK). When we move to the several variable case f : E + R™, where 
E CR", we shall still want the derivative to be some quantity D such 
that f(x) — f(ao) » L(@ — x0). However, since f(x) — f(x%o) is now 
an m-dimensional vector and x — xg is an n-dimensional vector, we no 
longer want L to be a scalar; we want it to be a linear transformation. 
More precisely: 


Definition 6.2.2 (Differentiability). Let E be a subset of R”, f : E> 
R” be a function, x9 € EF be a point, and let LD: R"” > R” be a linear 
transformation. We say that f is differentiable at xq with derivative L 
if we have 


am WPL = Uo) + Le = 20) 


= 0. 
xr—>x9;xEe H—{x9} ||x _ Xo|| 


Here ||z|| is the length of x (as measured in the /? metric): 
I|(v1, x2, aes ,Xn)|| — (xt a ats pete re he 


Example 6.2.3. Let f : R? + R? be the map f(z, y) := (x?,y?), let 
xo be the point x9 := (1,2), and let L : R? > R? be the map L(, y) := 
(2x,4y). We claim that f is differentiable at xo with derivative L. To 
see this, we compute 


IIf(@,y) — (FL, 2) + L((a, y) — (1, 2)))II 
(x,y) +(1,2):(@,y) #(1,2) I(x, y) — (1, 2)|| 


Making the change of variables (x, y) = (1,2) + (a, 0), this becomes 


i FC + a, 2+ 6) — (FG, 2) + £(@,8))IL 
(a,b)—+(0,0):(a,b)£(0,0) Il (a, b) | 


Substituting the formula for f and for L, this becomes 


' I|((1 + a)”, (2 + 6)?) — (1,4) — (2a, 46))]| 
1m ’ 
(a,b)—+(0,0):(a,b)£(0,0) II (a, b)|| 


which simplifies to 


en || (a?, b?)|| 
(a,b)-+(0,0):(a,b)4(0,0) ||(a, 0) || 
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2 p2 
We use the squeeze test. The expression Moat is clearly non-negative. 
On the other hand, we have by the iaaaele inequality 


(a7, 8°) < II@?, 0) + 10,0) =a? +0? 


and hence 


2 72 
Kaa ee ar 
II(a, 8) || 
Since Va? + b? > 0 as (a,b) + 0, we thus see from the squeeze test that 
the above limit exists and is equal to 0. Thus f is differentiable at xg 
with derivative L. 


As you can see, verifying that a function is differentiable from first 
principles can be somewhat tedious. Later on we shall find better ways 
to verify differentiability, and to compute derivatives. 

Before we proceed further, we have to check a basic fact, which is 
that a function can have at most one derivative at any interior point of 
its domain: 


Lemma 6.2.4 (Uniqueness of derivatives). Let E be a subset of R”, 
f:& 7 R” be a function, x9 € E be an interior point of E, and let 
I, :R”" > R” and Lz: R” > R” be linear transformations. Suppose 
that f is differentiable at xo with derivative L,, and also differentiable 
at x9 with derivative Ly. Then Ly = Ly. 


Proof. See Exercise 6.2.2. 


Because of Lemma 6.2.4, we can now talk about the derivative of f 
at interior points x9, and we will denote this derivative by f’(xo). Thus 
f'(xo) is the unique linear transformation from R” to R™ such that 


F(x) — (F(@o) + F'(xo)( = 20))Il _ 4 


lim 
rt x0;xALO |x = Xo|| 


Informally, this means that the derivative f’(zo) is the linear transfor- 
mation such that we have 


f(x) — f(®o) © f'(xo) (x — x0) 
or equivalently 
f(x) © (xo) + f'(xo) (x — 20) 


(this is known as Newton’s approximation; compare with Proposition 
10.1.7). 
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Another consequence of Lemma 6.2.4 is that if you know that f(x) = 
g(x) for all x € E, and f,g are differentiable at xo, then you also know 
that f’(2o) = g'(xo) at every interior point of E. However, this is not 
necessarily true if xo is a boundary point of FE; for instance, if FE is just 
a single point F = {xo}, merely knowing that f(2o) = g(a) does not 
imply that f’(2o) = g’(xo). We will not deal with these boundary issues 
here, and only compute derivatives on the interior of the domain. 

We will sometimes refer to f’ as the total derivative of f, to distin- 
guish this concept from that of partial and directional derivatives below. 
The total derivative f is also closely related to the derivative matriz Df, 
which we shall define in the next section. 


— Exercises — 
Exercise 6.2.1. Prove Lemma 6.2.1. 


Exercise 6.2.2. Prove Lemma 6.2.4. (Hint: prove by contradiction. If Ly 4 Lo, 
then there exists a vector v such that Lyv 4 Lv; this vector must be non-zero 
(why?). Now apply the definition of derivative, and try to specialize to the case 
where x = xo + tu for some scalar t, to obtain a contradiction.) 


6.3 Partial and directional derivatives 


We now connect the notion of differentiability with that of partial and 
directional derivatives, which we now introduce. 


Definition 6.3.1 (Directional derivative). Let E be a subset of R”, 
f:&—R” bea function, let xo be an interior point of EL, and let v be 
a vector in R”. If the limit 


f (xo + tv) — f (xo) 


lim 
t0;t>0,29+tve EB t 


exists, we say that f is differentiable in the direction v at xo, and we 
denote the above limit by D, f(x): 


Dy f (xo) = _ jim f (xo + fe) = f (Xo) 


Remark 6.3.2. One should compare this definition with Definition 
6.2.2. Note that we are dividing by a scalar t, rather than a vector, 
so this definition makes sense, and D, f(xo) will be a vector in R”™. It is 
sometimes possible to also define directional derivatives on the boundary 
of E, if the vector v is pointing in an “inward” direction (this generalizes 
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the notion of left derivatives and right derivatives from single variable 
calculus); but we will not pursue these matters here. 


Example 6.3.3. If f : R — R is a function, then D+, f(x) is the same 
as the right derivative of f(x) (if it exists), and similarly D_, f(x) is the 
same as the left derivative of f(x) (if it exists). 


Example 6.3.4. We use the function f : R? + R? defined by f(z, y) := 
(x?, y?) from before, and let xo := (1,2) and v := (3,4). Then 


DGS ee fl +362 42) = 7(1,2) 
t0;t>0 t 


_ (1+ 6t + 97,4 + 16¢ + 16¢) — (1,4) 
~— £-30:t>0 t 


= lim (6+9t, 16+ 16t) = (6,16). 
t0;t>0 


Directional derivatives are connected with total derivatives as fol- 
lows: 


Lemma 6.3.5. Let E be a subset of R”, f : E > R™ be a function, 
xo be an interior point of E, and let v be a vector in R”. If f is 
differentiable at xo, then f is also differentiable in the direction v at xo, 
and 


Dy f (x0) = f'(xo)v. 
Proof. See Exercise 6.3.1. 


Remark 6.3.6. One consequence of this lemma is that total differentia- 
bility implies directional differentiability. However, the converse is not 
true; see Exercise 6.3.3. 


Closely related to the concept of directional derivative is that of 
partial derivative: 


Definition 6.3.7 (Partial derivative). Let E be a subset of R”, let 
f : E > R” be a function, let x9 be an interior point of FE, and let 
1 <j <n. Then the partial derivative of f with respect to the x; 
variable at x9, denoted (a0), is defined by 

f(xo + tej) — f(zo) _ d 


ae — = Lo + te; )\t— 
Ox; (v0) £-+0;t40.n9 Hej EE t att | 0 + tes)|t=0 


provided of course that the limit exists. (If the limit does not exist, we 


leave a (x0) undefined). 
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Informally, the partial derivative can be obtained by holding all the 
variables other than x; fixed, and then applying the single-variable cal- 
culus derivative in the x; variable. Note that if f takes values in R”, 
then so will Be. Indeed, if we write f in components as f = (fi,..-, fm); 
it is easy to see (why?) that 


Ba, 7) = (S220), --- F2(@0)) 


i.e., to differentiate a vector-valued function one just has to differentiate 
each of the components separately. 

We sometimes replace the variables x; in oe with other symbols. 
For instance, if we are ne with the function f(x,y) = (#7, y?), then 
we might refer to of and $f f instead of gL and ge. (In this case, 


of (x,y) = (22,0) and 5 (a LY Oe = (0,2y)). One should caution however 
that one should only relabel the variables if it is absolutely clear which 
symbol refers to the first variable, which symbol refers to the second 
variable, etc.; otherwise one may become unintentionally confused. For 
instance, in the above example, the expression °F (a, a) is just (22,0), 
however one may mistakenly compute 


fa) 0 
s(n, 2) = age) = (22, 20); 


the problem here is that the symbol x is being used for more than just 
the first variable of f. (On the other hand, it is true that + f(x, z) is 
equal to (22, 22); thus the operation of total differentiation fi is not the 
same as that of partial differentiation 2). 

From Lemma 6.3.5, we know that if a function is differentiable at a 


point xo, then all the partial derivatives ae exist at xo, and that 


Also, if = (i562) = - vje;, then we have 
Dy f (xo) ‘(x0 dames = S "uj f' (woe; 
Jj 


(since f’(a9) is linear) and thus 


Dy Feo) = = Das; (x0). 
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Thus one can write directional derivatives in terms of partial derivatives, 
provided that the function is actually differentiable at that point. 

Just because the partial derivatives exist at a point x79, we cannot 
conclude that the function is differentiable there (Exercise 6.3.3). How- 
ever, if we know that the partial derivatives not only exist, but are 
continuous, then we can in fact conclude differentiability, thanks to the 
following handy theorem: 


Theorem 6.3.8. Let E be a subset of R", f: E > R™ be a function, 
F be a subset of E, and xo be an interior point of F’. If all the partial 
derivatives ae exist on F and are continuous at xo, then f is differen- 
tiable at x9, and the linear transformation f’(a9): RR" > R™ is defined 
by 


= O 
f' (x0) (0; )i<j<n = geo): 


j=l 


Proof. Let L: R" + R™ be the linear transformation 


We have to prove that 
fen UPC) = (F (a0) + H( ~ 20) 


r—>x9;xEe H—{x9} |x =— xo|| 


= 0. 


Let ¢ > 0. It will suffice to find a radius 6 > 0 such that 
I f(x) — (Fo) + L(t = xo) Ie 3 


Iz — roll > 


for all x € B(xo,6)\{xo}. Equivalently, we wish to show that 


I f() — f(vo) — L(a — xo)|| < €lla — voll 


for all x € B(axo,6)\{xo}. 

Because 9 is an interior point of F’, there exists a ball B(xo,r) which 
is contained inside F’. Because each partial derivative ae is continuous 
on F’, there thus exists an 0 < 6; < r such that lag (2) - | a 
e/nm for every x € B(x, 6;). If we take 6 = min(d1,...,6n), then we 


thus have lak (2) - o£ (a0) < e/nm for every x € B(x, 0) and every 
Psgen, 
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Let x € B(xp,6). We write x = xp + vie, + v29eg +... + Unen for 
some scalars v1,...,Un. Note that 


lc — oll = fo? +B +... 402 


and in particular we have |v;| < ||~ — xo|| for all 1 < j <n. Our task is 
to show that 


“ ¢ 
f(zo + vier +...+Unen) — f(x) Soe (00) < ella — xo. 
j=l ? 
Write f in components as f = (f1, fo,..., fm) (so each f; is a function 


from E to R). From the mean value theorem in the x; variable, we see 
that 


fi(vo + vie1) — fi(zo) = OF Ga + te1)v1 


Ox1 
for some t; between 0 and v,. But we have 
Ofi Of; of of 
tye1) — < + tj = 
pe (eo + ter) — 58 (aa)| < | OF (eo + Hex) — 5F (ao) | < c/n 
and hence 
Ofi 
filzo + vier) — fi(Zo) — Ba, (20) < e|vi|/nm. 
L1 
Summing this over all 1 < i < m (and noting that ||(y1,...,Y%m)|| < 


lyi]+...+|Ym| from the triangle inequality) we obtain 


S elvil/n; 


| Ae +uie1)— (to) = SE (ao) 


since |v;| < ||a~ — xo||, we thus have 


[Axo + exer) — shoo) — 5 (eo)es Sie asin 


Ox 


A similar argument gives 


< ellx — x9||/n. 


[rex + ve, + vze2) — f(x + v1e1) — SH (wa)v2 
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and so forth up to 


|r +vyer+...+Unen) — f(zo + vier +... + Un—1€n-1) 


of 


- 2 (a)o4]| < elle — aoll/n 
In 


If we sum these n inequalities and use the triangle inequality ||a + y|| < 
||| + |y||, we obtain a telescoping series which simplifies to 


n 


0 
f (vo + v1e1 +... + nen) — f(wo) — >> ot (x0)v;]| < ella — xol| 
joie 


as desired. 


From Theorem 6.3.8 and Lemma 6.3.5 we see that if the partial 
derivatives of a function f : E — R™ exist and are continuous on some 
set F’, then all the directional derivatives also exist at every interior 
point zo of Ff’, and we have the formula 


: ) 
Deum gs ee) = eg loo) 


In particular, if f : EF — R is a real-valued function, and we define 
the gradient Vf(ao) of f at xo to be the n-dimensional row vector 
Vf (xo) := (ZL (20), ve ZL (a0), then we have the familiar formula 


Dyf (xo) =v Vf (x0) 


whenever Zo is in the interior of the region where the gradient exists and 
is continuous. 

More generally, if f : # — R”™ is a function taking values in R™, 
with f = (fi,..-, fm), and Zo is in the interior of the region where the 
partial derivatives of f exist and are continuous, then we have from 
Theorem 6.3.8 that 

n 
f' (xo) (vs )i<i<n = >> vj = (ao) 
rae Ox; 


m 


“., Ofi 
=| 5-55 @o) ’ 
a Ox; 


i=1 
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which we can rewrite as 


Lp f(a) (Vj )i<j<n 


where Df (ao) is the m x n matrix 


Of; 
D§(a0) = (54 (eo)) 

Lj 1<i<mil<j<n 
a a a 
Br (0) Fes (70) Di (0) 

| x2) $2 (0) oP (x9) 
Ofm i Ofm 
(vo) GEt(wo) ... HE (ao) 


Thus we have 
(Du f(xo))” = (f"(xo)v)” = Df (xo)v". 


The matrix Df (xo) is sometimes also called the derivative matrix or 
differential matrix of f at xo, and is closely related to the total derivative 
f'(xo). One can also write Df as 


Df (x) = (FE (wo) Sh Gel cs 5h (eo)" | 


i.e., each of the columns of Df (zo) is one of the partial derivatives of f, 
expressed as a column vector. Or one could write 


V fi (xo) 

V fo(x 
DiG)= |. f2(£0) 
V fim(£o) 


i.e., the rows of Df (xo) are the gradient of various components of f. In 
particular, if f is scalar-valued (i.e., m = 1), then Df is the same as 
Vf. 


Example 6.3.9. Let f : R? > R? be the function f(x,y) = (a7 + 
xy,y”). Then of = (2x + y,0) and af = (x,2y). Since these partial 
derivatives are continuous on R?, we see that f is differentiable on all 


of R?, and 
=f 2249 2 
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Thus for instance, the directional derivative in the direction (v, w) is 
Devw) f(@, y) = (2x + yu + ww, 2yw). 


— Exercises — 
Exercise 6.3.1. Prove Lemma 6.3.5. (This will be similar to Exercise 6.2.1). 


Exercise 6.3.2. Let E be a subset of R”, let f : H + R™ be a function, let 
xo be an interior point of EF, and let 1 < 7 < n. Show that 5 (x0) exists 
if and only if De, f(ao) and D_., f(ao) exist and are negatives of each other 
(thus De, f(xo) = D_e,f(xo)); furthermore, one has aE (x0) = D.,f(zo) in 
this case. 

Exercise 6.3.3. Let f : R? > R be the function defined by f(x,y) := oe 
when (x,y) 4 (0,0), and f(0,0) := 0. Show that f is not differentiable at 
(0,0), despite being differentiable in every direction v € R? at (0,0). Explain 
why this does not contradict Theorem 6.3.8. 

Exercise 6.3.4. Let f :R” + R™ be a differentiable function such that f’(2) = 
0 for all 2 € R”. Show that f is constant. (Hint: you may use the mean-value 
theorem or fundamental theorem of calculus for one-dimensional functions, 
but bear in mind that there is no direct analogue of these theorems for several- 
variable functions. I would not advise proceeding via first principles.) For a 
tougher challenge, replace the domain R” by an open connected subset 2 of 
R”. 


6.4 The several variable calculus chain rule 


We are now ready to state the several variable calculus chain rule. Recall 
that if f: X > Yandg:Y — Z are two functions, then the composition 
gof:X — Z is defined by go f(x) := g(f(x)) for alla e X. 


Theorem 6.4.1 (Several variable calculus chain rule). Let E be a subset 
of R”, and let F be a subset of R™. Let f : E > F be a function, and let 
g:F > R? be another function. Let xq be a point in the interior of E. 
Suppose that f is differentiable at xo, and that f(xo) is in the interior of 
F. Suppose also that g is differentiable at f(ap). Thengof:E— RP 
is also differentiable at xp, and we have the formula 


(90 f)'(xo) = g'(f(@0)) f’ (xo). 
Proof. See Exercise 6.4.3. 


One should compare this theorem with the single-variable chain rule, 
Theorem 10.1.15; indeed one can easily deduce the single-variable rule 
as a consequence of the several-variable rule. 
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Intuitively, one can think of the several variable chain rule as follows. 
Let x be close to x9. Then Newton’s approximation asserts that 


f(x) — f (x0) © f’(xo)(x — x0) 


and in particular f(x) is close to f(ao). Since g is differentiable at f (xo), 
we see from Newton’s approximation again that 


9(f(2)) — 9(f(20)) © 9'(F (20) )(F (2) — f(x0))- 


Combining the two, we obtain 
g° f(x) — 90 f(xo) © 9'(F(20)) Ff’ (x0) (« — 20) 


which then should give (go f)'(zo) = g'(f(xo))f'(ao). This argument 
however is rather imprecise; to make it more precise one needs to ma- 
nipulate limits rigorously; see Exercise 6.4.3. 

As a corollary of the chain rule and Lemma 6.1.16 (and Lemma 
6.1.13), we see that 


D(g° f)(xo) = Dg(f(xo)) DF (xo); 


i.e., we can write the chain rule in terms of matrices and matrix multi- 
plication, instead of in terms of linear transformations and composition. 


Example 6.4.2. Let f : R” — Rand g: R” > R be differentiable 
functions. We form the combined function h : R” > R? by defining 
h(x) := (f(x), g(x)). Now let k : R? > R be the multiplication function 
k(a, b) := ab. Note that 


while 
Dk(a, b) = (0, a) 


(why?). By the chain rule, we thus see that 
Dboh) (x0) = (ata). H(a0)) ( SA) ) = glo) VF (e0)+ fo). 


But koh = fg (why?), and D(fg) = V(fg). We have thus proven the 
product rule 


Vifg) =9VE+fVQ. 
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A similar argument gives the sum rule V(f + g) = Vf + Vg, or the 
difference rule V(f—g) = Vf—V4g, as well as the quotient rule (Exercise 
6.4.4). As you can see, the several variable chain rule is quite powerful, 
and can be used to deduce many other rules of differentiation. 

We do record one further useful application of the chain rule. Let 
T : R” > R” be a linear transformation. From Exercise 6.4.1 we 
observe that T is continuously differentiable at every point, and in fact 
T'(x) = T for every x. (This equation may look a little strange, but 
perhaps it is easier to swallow if you view it in the form (Tx) = 1): 
Thus, for any differentiable function f : E — R”, we see that Tf: E> 
R” is also differentiable, and hence by the chain rule 


(Tf)' (wo) = T(f' (a0). 


This is a generalization of the single-variable calculus rule (cf)’ = c(f’) 
for constant scalars c. 

Another special case of the chain rule which is quite useful is the 
following: if f : R” — R"™ is some differentiable function, and x; : R — 
R are differentiable functions for each 7 = 1,...n, then 


“ (er(t);@a(t), 0+ ,2n(t)) = LHOGE HO. 200, salt Alb) 


(Why is this a special case of the chain rule?). 


— Exercises — 


Exercise 6.4.1. Let T: R” — R™ be a linear transformation. Show that T is 
continuously differentiable at every point, and in fact T’(a) = T for every . 
What is DT? 


Exercise 6.4.2. Let E be a subset of R”. Prove that if a function f: E ~ R™ 
is differentiable at an interior point xp of E, then it is also continuous at zo. 
(Hint: use Exercise 6.1.4.) 


Exercise 6.4.3. Prove Theorem 6.4.1. (Hint: you may wish to review the proof 
of the ordinary chain rule in single variable calculus, Theorem 10.1.15. The 
easiest way to proceed is by using the sequence-based definition of limit (see 
Proposition 3.1.5(b)), and use Exercise 6.1.4.) 


Exercise 6.4.4. State and prove some version of the quotient rule for functions 
of several variables (i.e., functions of the form f : E + R for some subset E of 
R”). In other words, state a rule which gives a formula for the gradient of f/g; 
compare your answer with Theorem 10.1.13(h). Be sure to make clear what all 
your assumptions are. 
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Evercise 6.4.5. Let £: R > R® be a differentiable function, and let r: R > R 
be the function r(t) := ||Z(¢)||], where ||z|| denotes the length of # as measured 
in the usual I? metric. Let to be a real number. Show that if r(to) 4 0, then r 
is differentiable at to, and 

a" (to) - Z(to) 


r' (to) => F(t) 


(Hint: use Theorem 6.4.1.) 


6.5 Double derivatives and Clairaut’s theorem 


We now investigate what happens if one differentiates a function twice. 


Definition 6.5.1 (Twice continuous differentiability). Let E be an open 
subset of R”, and let f: E — R”™ be a function. We say that f is con- 
tinuously differentiable if the partial derivatives gL oe ge exist and 
are continuous on EF. We say that f is twice continuously differentiable if 
it is continuously differentiable, and the partial derivatives oe eee 


> On 
are themselves continuously differentiable. 


Remark 6.5.2. Continuously differentiable functions are sometimes 
called C1 functions; twice continuously differentiable functions are some- 
times called C? functions. One can also define C?, C*, etc. but we shall 
not do so here. 


Example 6.5.3. Let f : R? > R? be the function f(z,y) = (a? + 

xy,y”). Then f is continuously differentiable because the partial deriva- 

tives of (a, y) = (2x + y,0) and Sf (x,y) = (x, 2y) exist and are contin- 

uous on all of R?. It is also twice continuously differentiable, because 

: ‘ . f) ) 

the double partial derivatives 2 (x,y) ="(2,0), 5s ee) =), 

ZF (a, y) ==(170), EF (a,y) = (0,2) all exist and are continuous. 
Observe in the above example that the double derivatives By Ox and 

ot are the same. This is a in fact a general phenomenon: 

Theorem 6.5.4 (Clairaut’s theorem). Let E be an open subset of R”, 

and let f : E + R”™ be a twice continuously differentiable function on 


E. Then we have dey be (0) = ae ae (x0) for Ul 4.7 <n: 


Proof. By working with one component of f at a time we can assume 
that m = 1. The claim is trivial if ¢ = j, so we shall assume that 
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i # j. We shall prove the theorem for x9 = 0; the general case is 
similar. (Actually, once one proves Clairaut’s theorem for x9 = 0, one 
can immediately obtain it for general x9 by applying the theorem with 
f(x) replaced by f(x + 20).) 

Let a be the number a := on of (0), and a’ denote the quantity 


al := xe ae (0). Our task is to show that a’ = a. 


Let ¢ > 0. Because the double derivatives of f are continuous, we 
can find a 6 > 0 such that 


O OF 
< 
Ox; da; - al 7 
and ; af. 
/ 
< 
Ox; an; ==> 


whenever |x| < 20. 
Now we consider the quantity 


X := f(deg + de;) — f(de;) — f (de;) + f(0). 


From the fundamental theorem of calculus in the e; variable, we have 


vs) 
fGer0e) =F Ge) = / = rel reais 


and F 
f(5e:) — f(0) = i: Be (vie) ag 


and hence 


ee af 
x= f (FE ese + de;) 5H (wie) dx;. 


But by the mean value theorem, for each x; we have 


Oficial OF 
ani ° Aa; Ox; 


f (je; + de;) — (xiei + 2je;) 


Ox; 


for some 0 < x; < 0. By our construction of 6, we thus have 


Of 3) 
ae (a52;->0e;) ee (xje;) — da| < €d 
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Integrating this from 0 to 6, we thus obtain 
|X — 6%a| < £6”. 


We can run the same argument with the role of 7 and 7 reversed 
(note that X is symmetric in 7 and 7), to obtain 


|x = 67a'| < 26". 
From the triangle inequality we thus obtain 
\o7a — 67a'| < 2267, 


and thus 
la —a’| < 2e. 


But this is true for all ¢ > 0, and a and a’ do not depend on ¢, and so 
we must have a = a’, as desired. 


One should caution that Clairaut’s theorem fails if we do not assume 
the double derivatives to be continuous; see Exercise 6.5.1. 


— Exercises — 


Exercise 6.5.1. Let f : R? > R be the function defined by f(x,y) := ha 
when (x,y) 4 (0,0), and f(0,0) := 0. Show that f is continuously differen- 


tiable, and the double derivatives 23t and £2 exist, but are not equal to 
y Ox x Oy 


each other at (0,0). Explain why this does not contradict Clairaut’s theorem. 


6.6 The contraction mapping theorem 


Before we turn to the next topic - namely, the inverse function theorem 
- we need to develop a useful fact from the theory of complete metric 
spaces, namely the contraction mapping theorem. 


Definition 6.6.1 (Contraction). Let (X,d) be a metric space, and 
let f : X — X be a map. We say that f is a contraction if we 
have d(f(x), f(y)) < d(a,y) for all z,y € X. We say that f isa 
strict contraction if there exists a constant 0 < c < 1 such that 
d( f(x), f(y)) < cd(x, y) for all x,y € X; we call c the contraction con- 
stant of f. 
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Examples 6.6.2. The map f : R > R defined by f(z) :=a+4+1lisa 
contraction but not a strict contraction. The map f : R > R defined by 
f(x) := «2/2 is a strict contraction. The map f : [0,1] — [0,1] defined 
by f(x) := x2 — 2? is a contraction but not a strict contraction. (For 
justifications of these statements, see Exercise 6.6.5.) 


Definition 6.6.3 (Fixed points). Let f : X + X bea map, and z € X. 
We say that x is a fixed point of f if f(x) =a. 


Contractions do not necessarily have any fixed points; for instance, 
the map f : R —> R defined by f(x) = «+1 does not. However, it turns 
out that strict contractions always do, at least when X is complete: 


Theorem 6.6.4 (Contraction mapping theorem). Let (X,d) be a metric 
space, and let f : X — X be a strict contraction. Then f can have at 
most one fixed point. Moreover, if we also assume that X is non-empty 
and complete, then f has exactly one fixed point. 


Proof. See Exercise 6.6.7. 


Remark 6.6.5. The contraction mapping theorem is one example of 
a fixed point theorem - a theorem which guarantees, assuming certain 
conditions, that a map will have a fixed point. There are a number of 
other fixed point theorems which are also useful. One amusing one is 
the so-called hairy ball theorem, which (among other things) states that 
any continuous map f : S$? — S? from the sphere S? := {(z,y,z) € 
R? : 2? + y? + 2? = 1} to itself, must contain either a fixed point, or 
an anti-fixed point (a point x € S$? such that f(z) = —2x). A proof of 
this theorem can be found in any topology text; it is beyond the scope 
of this text. 


We shall give one consequence of the contraction mapping theorem 
which is important for our application to the inverse function theorem. 
Basically, this says that any map f on a ball which is a “small” pertur- 
bation of the identity map, remains one-to-one and cannot create any 
internal holes in the ball. 


Lemma 6.6.6. Let B(0,r) be a ball in R” centered at the origin, and 
let g: B(0O,r) > R” be a map such that g(0) = 0 and 


g(x) - g(u)|l < alle — yl 
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for all x,y € B(0,r) (here ||x|| denotes the length of x in R”). Then the 
function f : B(0O,r) + R” defined by f(x) := x+ g(x) is one-to-one, and 
furthermore the image f(B(0,r)) of this map contains the ball B(0,r/2). 


Proof. We first show that f is one-to-one. Suppose for sake of con- 
tradiction that we had two different points z,y € B(0,r) such that 
f(x) = f(y). But then we would have x + g(x) = y+ g(y), and hence 


IIg(@) — 9(y)I| = lla — gM. 


The only way this can be consistent with our hypothesis ||g(a) — g(y)|| < 
5\lx — yll is if lla — y|| =0, ie., if 2 = y, a contradiction. Thus f is one- 
to-one. 

Now we show that f(B(0,r)) contains B(0,r/2). Let y be any point 
in B(0,r/2); our objective is to find a point x € B(0,r) such that 
f(x) = y, or in other words that x = y — g(x). So the problem is now 
to find a fixed point of the map x+> y — g(x). 

Let F : B(O,r) — B(0,r) denote the function F(x) := y — g(a). 
Observe that if « € B(0,r), then 


IFO) < lvl +lo@l <5 + lle) -9O < 5 +5 


r r 
Ol] < =r, 
Sgr le-0l< 545 >% 


so F' does indeed map B(0,r) to itself. The same argument shows that 
for a sufficiently small ¢ > 0, F maps the closed ball B(0,r — €) to itself. 
Also, for any x, 2’ in B(0,r) we have 


|F() — F(@’)|| = Ila’) - 9(@)Il S sl! — 2|| 


so F is a strict contraction on B(0,7r), and hence on the complete space 
B(0,r —«). By the contraction mapping theorem, F' has a fixed point, 
ie., there exists an x such that x = y — g(x). But this means that 
f(x) = y, as desired. 


— Exercises — 


Exercise 6.6.1. Let f : [a,b] — R be a differentiable function of one variable 
such that |f’(x)| <1 for all x € [a,b]. Prove that f is a contraction. (Hint: 
use the mean-value theorem, Corollary 10.2.9.) If in addition |f’(x)| < 1 for 
all x € [a,b] and f’ is continuous, show that f is a strict contraction. 


Exercise 6.6.2. Show that if f : [a,b] > R is differentiable and is a contraction, 
then | f’(x)| < 1. 
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Exercise 6.6.3. Give an example of a function f : [a,b] + R which is con- 
tinuously differentiable and such that |f(x) — f(y)| < |x — y| for all distinct 
x,y € [a,b], but such that |f’(x)| = 1 for at least one value of x € [a,b]. 


Exercise 6.6.4. Given an example of a function f : [a,b] — R which is a strict 
contraction but which is not differentiable for at least one point x in [a, db}. 


Exercise 6.6.5. Verify the claims in Examples 6.6.2. 


Exercise 6.6.6. Show that every contraction on a metric space X is necessarily 
continuous. 


Exercise 6.6.7. Prove Theorem 6.6.4. (Hint: to prove that there is at most one 
fixed point, argue by contradiction. To prove that there is at least one fixed 
point, pick any xo € X and define recursively x1 = f(xo0), v2 = f(x1), v3 = 
f(x2), etc. Prove inductively that d(tn441,%) < c”d(#1,x9), and conclude 
(using the geometric series formula, Lemma 7.3.3) that the sequence (4)? 
is a Cauchy sequence. Then prove that the limit of this sequence is a fixed 
point of f.) 

Exercise 6.6.8. Let (X,d) be a complete metric space, and let f : X + X and 
g:X — X be two strict contractions on X with contraction coefficients c and 
c’ respectively. From Theorem 6.6.4 we know that f has some fixed point 2, 
and g has some fixed point yo. Suppose we know that there is an ¢ > 0 such 
that d(f(x),g(x)) < € for all a € X (ie., f and g are within € of each other 
in the uniform metric). Show that d(xo, yo) < ¢/(1 — min(c,c’)). Thus nearby 
contractions have nearby fixed points. 


6.7 The inverse function theorem in several variable 
calculus 


We recall the inverse function theorem in single variable calculus (The- 
orem 10.4.2), which asserts that if a function f : R — R is invertible, 
differentiable, and f’(a9) is non-zero, then f~! is differentiable at f (2x9), 
and i 
-ly 

In fact, one can say something even when f’ is not invertible, as 
long as we know that f is continuously differentiable. If f’(xo) is non- 
zero, then f’(x9) must be either strictly positive or strictly negative, 
which implies (since we are assuming f’ to be continuous) that f’(x) is 
either strictly positive for 7 near xg, or strictly negative for 7 near Zo. 
In particular, f must be either strictly increasing near x9, or strictly 
decreasing near xg. In either case, f will become invertible if we restrict 
the domain and range of f to be sufficiently close to xq and to f(z0) 
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respectively. (The technical terminology for this is that f is locally 
invertible near x9.) 

The requirement that f be continuously differentiable is important; 
see Exercise 6.7.1. 

It turns out that a similar theorem is true for functions f : R” — R” 
from one Euclidean space to the same space. However, the condition that 
f'(xo) is non-zero must be replaced with a slightly different one, namely 
that f’(ao) is invertible. We first remark that the inverse of a linear 
transformation is also linear: 


Lemma 6.7.1. Let T : R” > R” be a linear transformation which is 
also invertible. Then the inverse transformation T~! : R" — R” is also 
linear. 


Proof. See Exercise 6.7.2. 


We can now prove an important and useful theorem, arguably one 
of the most important theorems in several variable differential calculus. 


Theorem 6.7.2 (Inverse function theorem). Let E be an open subset 
of R”, and let f : E — R” be a function which is continuously differ- 
entiable on E. Suppose xq € E is such that the linear transformation 
f'(zo) : R" > R” is invertible. Then there exists an open set U in 
E containing xo, and an open set V in R” containing f(xo), such that 
f is a bijection from U to V. In particular, there is an inverse map 
f-!:V SU. Furthermore, this inverse map is differentiable at f (xo), 
and 


(f-1Y (Ff (a0)) = (f"(a0))71. 


Proof. We first observe that once we know the inverse map f~! is dif- 
ferentiable, the formula (f~')/(f(xo)) = (f’(v0))~+ is automatic. This 
comes from starting with the identity 


I=foof 


on U, where J : R” > R” is the identity map Ix := «x, and then 
differentiating both sides using the chain rule at x9 to obtain 


Lag) = Fea) y aa): 


Since I’(xq) = I, we thus have (f~!)'(f(ao)) = (f’(xo))~! as desired. 
We remark that this argument shows that if f(xo) is not invertible, 
then there is no way that an inverse f~! can exist and be differentiable 


at f(x). 
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Next, we observe that it suffices to prove the theorem under the 
additional assumption f(z) = 0. The general case then follows from 
the special case by replacing f by a new function f(x) := f(x) — f (xo) 
and then applying the special case to f (note that V will have to shift 
by f(xo)). Note that f-!(y) = f-'(y — f(ao)) (why?). Henceforth we 
will always assume f(x) = 0. 

In a similar manner, one can make the assumption x) = 0. The 
general case then follows from this case by replacing f by a new function 
f(x) := f(a +29) and applying the special case to f (note that E and 
U will have to shift by x9). Note that f~!(y) = f7(y) + a0 - why? 
Henceforth we will always assume xg = 0. Thus we now have that 
f(0) =0 and that f’(0) is invertible. 

Finally, one can assume that f’(0) = J, where J : R” > R” is the 
identity transformation Iz = x. The general case then follows from 
this case by replacing f with a new function f : E > R” defined by 
f(x) :-= f'(0)"' f(a), and applying the special case to this case. Note 
from Lemma 6.7.1 that f’(0)~? is a linear transformation. In particular, 
we note that f(0) = 0 and that 


f'(0) = f'(0)*f'(0) =I, 


so by the special case of the inverse function theorem we know that 
there exists an open set U’ containing 0, and an open set V’ containing 
0, such that f is a bijection from U’ to V’, and that f~! : V’ > U’ 
is differentiable at 0 with derivative I. But we have f(x) = f'(0)f(z), 
and hence f is a bijection from U’ to f’(0)(V’) (note that f’(0) is also 
a bijection). Since f’(0) and its inverse are both continuous, f’(0)(V’) 
is open, and it certainly contains 0. Now consider the inverse function 
fo): f/(0)\(V’) > U’. Since f(x) = f’(0)f(x), we see that f(y) = 
f-'(f'(0)-1y) for all y € f’(0)(V’) (why? use the fact that f is a 
bijection from U’ to V’). In particular we see that f~! is differentiable 
at 0. 

So all we have to do now is prove the inverse function theorem in the 
special case, when zp = 0, f(xo) = 0, and f’(rp) =I. Let g: E> R” 
denote the function f(a)—a. Then g(0) = 0 and g'/(0) = 0. In particular 


Og 


5g; 0) = ° 
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for 7 =1,...,n. Since g is continuously differentiable, there thus exists 
a ball B(0,r) in E such that 


: 


= or 


see Ox; 


for all x € B(0,r). (There is nothing particularly special about 5- ate, we 
just need a nice small number here.) In particular, for any x € B(0,r) 


and v = (v1,.--,Un) we have 


a= gel 


But now for any x,y € B(0,r), we have by the fundamental theorem of 
calculus 
pare) 
a= | Cee ONE 
9 dt 


i 
= / Dy-xg(z + t(y — x)) dt. 


By the previous remark, the vectors Dy_zg(a+t(y—«)) have a magni- 
tude of at most sly — «||. Thus every component of these vectors has 
magnitude at most + ||y—<||. Thus every component of g(y) — g(x) has 
magnitude at most 5-||y — ||, and hence g(y) — g(x) itself has magni- 
tude at most 5||y — 2|| (actually, it will be substantially less than this, 
but this potiad will be enough for our purposes). In other words, g is a 
contraction. By Lemma 6.6.6, the map f = g+ J is thus one-to-one on 
B(0,r), and the image f(B(0,r)) contains B(0,r/2). In particular we 
have an inverse map f—! : B(0,r/2) > B(0,r) defined on B(0,r/2). 

Applying the contraction bound with y = 0 we obtain in particular 
that 


1 
lo@ll < 5 lll 
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for all x € B(0,r), and so by the triangle inequality 


1 
sll < I@I < Sle 


for all x € B(O,r). 

Now we set V := B(0,r/2) and U := f~'(B(0,r/2)). Then by 
construction f is a bijection from U to V. V is clearly open, and U = 
f—'(V) is also open since f is continuous. (Notice that if a set is open 
relative to B(0,r), then it is open in R” as well). Now we want to show 
that f-!: V > U is differentiable at 0 with derivative J~' = I. In other 
words, we wish to show that 


am US M@) = #0) - He - 0) 


=(): 
xz 0;2EV\{0} \|x|| 


Since f(0) = 0, we have f~'(0) = 0, and the above simplifies to 


F2(@) — all _ 


im 0. 
x—+0;x€V\ {0} I|=|| 


Let (@p,)°2, be any sequence in V\0 that converges to 0. By Proposition 
3.1.5(b), it suffices to show that 


li 5 ey) — Xn _ 
n—F00 ||zn| 


0. 


Write yn := f~'(an). Then yn € B(0,r) and zn = f (yn). In particular 
we have 


1 3 
sllynll < Ukenll < 5 lal 


and so since ||x,|| goes to 0, ||Yyn|| goes to zero also, and their ratio 
remains bounded. It will thus suffice to show that 


fm UYe = FOn)IL 


nco ||| 


0. 


But since yp, is going to 0, and f is differentiable at 0, we have 


fim fn) = £00) ~ f"0)(n ~ 0) 


r—700 lI¥nll 


=0 


as desired (since f(0) = 0 and f’(0) = J). 
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The inverse function theorem gives a useful criterion for when a func- 
tion is (locally) invertible at a point xo - all we need is for its derivative 
f'(xo) to be invertible (and then we even get further information, for 
instance we can compute the derivative of f~! at f(ao)). Of course, this 
begs the question of how one can tell whether the linear transformation 
f'(xo) is invertible or not. Recall that we have f’(x%9) = Lp fe), 80 by 
Lemmas 6.1.13 and 6.1.16 we see that the linear transformation f(z) 
is invertible if and only if the matrix Df(xo) is. There are many ways 
to check whether a matrix such as Df (29) is invertible; for instance, one 
can use determinants, or alternatively Gaussian elimination methods. 
We will not pursue this matter here, but refer the reader to any linear 
algebra text. 

If f’(xo) exists but is non-invertible, then the inverse function theo- 
rem does not apply. In such a situation it is not possible for f~! to exist 
and be differentiable at x9; this was remarked in the above proof. But 
it is still possible for f to be invertible. For instance, the single-variable 
function f : R — R defined by f(x) = 2° is invertible despite f’(0) not 
being invertible. 


— Exercises — 

Exercise 6.7.1. Let f : R + R be the function defined by f(a) := 2 + 
x sin(1/a*) for 2 # 0 and f(0) := 0. Show that f is differentiable and 
f'(0) = 1, but f is not increasing on any open set containing 0 (Hint: show 
that the derivative of f can turn negative arbitrarily close to 0. Drawing a 
graph of f may aid your intuition.) 

Exercise 6.7.2. Prove Lemma 6.7.1. 

Exercise 6.7.3. Let f : R" — R” be a continuously differentiable function such 
that f’() is an invertible linear transformation for every x € R”. Show that 
whenever V is an open set in R”, that f(V) is also open. (Hint: use the inverse 
function theorem.) 


6.8 The implicit function theorem 


Recall (from Exercise 3.5.10) that a function f : R > R gives rise to a 
graph 
{(a, f(x)) : x € R} 


which is a subset of R?, usually looking like a curve. However, not all 
curves are graphs, they must obey the vertical line test, that for every 
x there is exactly one y such that (x,y) is in the curve. For instance, 
the circle {(z,y) € R? : 2? + y? = 1} is not a graph, although if one 
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restricts to a semicircle such as {(z,y) € R? : 2? + y? =1,y > 0} then 
one again obtains a graph. Thus while the entire circle is not a graph, 
certain local portions of it are. (The portions of the circle near (1,0) 
and (—1,0) are not graphs over the variable x, but they are graphs over 
the variable y). 

Similarly, any function g : R” — R gives rise to a graph 
{(x,9(x)) : « € R"} in R"*!, which in general looks like some sort 
of n-dimensional surface in R"*! (the technical term for this is a hy- 
persurface). Conversely, one may ask which hypersurfaces are actually 
graphs of some function, and whether that function is continuous or 
differentiable. 

If the hypersurface is given geometrically, then one can again invoke 
the vertical line test to work out whether it is a graph or not. But 
what if the hypersurface is given algebraically, for instance the surface 
{(x,y,2) © R3: eyt+yz+z2x2 = —1}? Or more generally, a hypersurface 
of the form {x € R” : g(x) = 0}, where g : R" > R is some function? 
In this case, it is still possible to say whether the hypersurface is a graph, 
locally at least, by means of the implicit function theorem. 


Theorem 6.8.1 (Implicit function theorem). Let E be an open subset of 
R”, let f : E > R be continuously differentiable, and let y = (y1,.--, Yn) 
be a point in E such that f(y) = 0 and ZL (y) #0. Then there exists an 
open subset U of R"—! containing (y1,-.-,;Yn—1), an open subset V of E 
containing y, and a function g: U + R such that g(y1,.--,Yn—1) = Yn; 
and 


{isnt ) EV Sf eisssas Bq) =O} 


SS (bia h yy ey 1s ness ae iss ep) EU 


In other words, the set {x € V: f(a) = 0} is a graph of a function over 
U. Moreover, g is differentiable at (y1,...,Yn—1), and we have 


Og _ Of of 
Bgny Ui Yn-a) = Bel de, (6.1) 


for alll <j<n-1. 


Remark 6.8.2. The equation (6.1) is sometimes derived using implicit 
differentiation. Basically, the point is that if you know that 


fACareeen ee =0 


6.8. The implicit function theorem 159 


then (as long as ge £ 4 0) the variable x, is “implicitly” defined in terms 
of the other n—1 variables, and one can differentiate the above identity 
in, say, the x; direction using the chain rule to obtain 


OF: OF 0X5 


da; OL Ox; 


which is (6.1) in disguise (we are using g to represent the implicit func- 
tion defining x, in terms of 2,...,2,). Thus, the implicit function 
theorem allows one to define a dependence implicitly, by means of a con- 
straint rather than by a direct formula of the form x, = g(#1,...,2%n—1)- 


Proof. This theorem looks somewhat fearsome, but actually it is a fairly 
quick consequence of the inverse function theorem. Let F': E > R” be 
the function 


PGi ae) = aes id CG ty en) I 
This function is continuously differentiable. Also note that 


F(y) = (yi, ses , Yn—1; 0) 


and 
a ) o 
DFly) = Gag yr, On Fhe) 
0 1 0 : 
0 ae 0 
mY) 360) - &) oe) 


Since o£ (y) is assumed by hypothesis to be non-zero, this matrix is 


invertible; this can be seen either by computing the determinant, or 
using row reduction, or by computing the inverse explicitly, which is 


1 0 .. O 0 

0 1 0 0 
DF(y)' = : : en : : 

0 0 ete. a 0 


-g£y)\/a -Z£y)/a ... —2E(y)/a 1/a 
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where we have written a = ZL (y) for short. Thus the inverse function 

theorem applies, and we can find an open set V in EF containing y, and an 

open set W in R” containing F(y) = (y1,---,Yn—1, 9), such that F' is a 

bijection from V to W, and that F'~! is differentiable at (yi,...,yn—1, 0). 
Let us write F—! in co-ordinates as 


F\(a) = (hy(x), ho(x),...,An(2)) 


where z € W. Since F(F71(x)) = a, we have h;(r1,..., 2%) = 2; for all 
1<j<n-—landzeéW, and 


} (igi a iis As 5) 


Also, hy is differentiable at (y1,...,%n—1,0) since F~! is. 

Now we set U := {(a1,...,2n—-1) € R®!: (a1,...,¢n-1,0) € WH. 
Note that U is open and contains (y1,...,Yn—1). Now we define g : U > 
R by g(a1,.--,;%n—1) := An(X1,.--,%n—1,0). Then g is differentiable at 
(Y1,---;Yn—-1). Now we prove that 


1 (His, eS fis oi ee) = OF 


Sd sis Bp Ae OL nate) Seip) ee YY: 


First suppose that (x%1,...,%n) € V and f(a,...,%n) = 0. Then 
we have F(a1,...,2n) = (%1,..-,%n—1,0), which lies in W. Thus 
(21,...,;2n—1) lies in U. Applying F7!, we see that (21,...,2n) = 
F-l(a1,...,2n—1,0). In particular tp = hn(x1,...,¢n—1,0), and hence 
Ln = G(X1,.--,;%n—1). Thus every element of the left-hand set lies in the 
right-hand set. The reverse inclusion comes by reversing all the above 
steps and is left to the reader. 

Finally, we show the formula for the partial derivatives of g. From 
the preceding discussion we have 


PPh engl sO ties). 0 


for all (41,...,%p—-1) € U. Since g is differentiable at (y1,...,Yn—1), and 
f is differentiable at (yi,...,Yn—1,9(Y1,---;Yn—1)) = y, We may use the 
chain rule, differentiating in x;, to obtain 


oF 4 Of 4. 09 
Ox; a Otn OX; 


(y)——(Y1,---,Yn—1) = 0 


and the claim follows by simple algebra. 
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Example 6.8.3. Consider the surface S$ := {(2,y,z) € R?: cy + yz+ 
za = —1}, which we rewrite as {(z,y,z) € R®: f(x,y,z) = 0}, where 
f :R° = R is the function f(z, y,z) := cyt yz+zr4+1. Clearly f is 
continuously differentiable, and of = y+x. Thus for any (Zo, yo, zo) in S 
with yo +. ro # 0, one can write this surface (near (20, yo, 20)) as a graph 
of the form {(z, y, g(x, y)) : (x,y) € U} for some open set U containing 
(xo, yo), and some function g which is differentiable at (xo, yo). Indeed 
one can implicitly differentiate to obtain that 


ole. oe aa re a ee 
On yo + Xo Oy” yo + x0 
of 


In the implicit function theorem, if the derivative 3;- equals zero at 
some point, then it is unlikely that the set {2 €¢ R” : f(x) = 0} can 
be written as a graph of the x, variable in terms of the other n — 1 
variables near that point. However, if some other derivative ae is zero, 
then it would be possible to write the x; variable in terms of the other 
n — 1 variables, by a variant of the implicit function theorem. Thus 
as long as the gradient Vf is not entirely zero, one can write this set 
{z € R” : f(x) = 0} as a graph of some variable x; in terms of the 
other n — 1 variables. (The circle {(z,y) € R? : 27+ y?-—1=0} isa 
good example of this; it is not a graph of y in terms of x, or x in terms 
of y, but near every point it is one of the two. And this is because the 
gradient of x? + y? — 1 is never zero on the circle.) However, if Vf does 
vanish at some point ro, then we say that f has a critical point at xo 
and the behavior there is much more complicated. For instance, the set 
{(z,y) € R? : 2? — y? = 0} has a critical point at (0,0) and there the 
set does not look like a graph of any sort (it is the union of two lines). 


Remark 6.8.4. Sets which look like graphs of continuous functions at 
every point have a name, they are called manifolds. Thus {x € R” : 
f(x) = 0} will be a manifold if it contains no critical points of f. The 
theory of manifolds is very important in modern geometry (especially 
differential geometry and algebraic geometry), but we will not discuss it 
here as it is a graduate level topic. 


Chapter 7 


Lebesgue measure 


In the previous chapter we discussed differentiation in several variable 
calculus. It is now only natural to consider the question of integration in 
several variable calculus. The general question we wish to answer is this: 
given some subset 2 of R”, and some real-valued function f :Q— R, 
is it possible to integrate f on Q to obtain some number ls f? (It is 
possible to consider other types of functions, such as complex-valued 
or vector-valued functions, but this turns out not to be too difficult 
once one knows how to integrate real-valued functions, since one can 
integrate a complex or vector valued function, by integrating each real- 
valued component of that function separately.) 

In one dimension we already have developed (in Chapter 11) the 
notion of a Riemann integral Sia.b} f, which answers this question when 
Q is an interval 2 = [a,b], and f is Riemann integrable. Exactly what 
Riemann integrability means is not important here, but let us just re- 
mark that every piecewise continuous function is Riemann integrable, 
and in particular every piecewise constant function is Riemann inte- 
grable. However, not all functions are Riemann integrable. It is possible 
to extend this notion of a Riemann integral to higher dimensions, but it 
requires quite a bit of effort and one can still only integrate “Riemann 
integrable” functions, which turn out to be a rather unsatisfactorily 
small class of functions. (For instance, the pointwise limit of Riemann 
integrable functions need not be Riemann integrable, and the same goes 
for an L? limit, although we have already seen that uniform limits of 
Riemann integrable functions remain Riemann integrable.) 

Because of this, we must look beyond the Riemann integral to obtain 
a truly satisfactory notion of integration, one that can handle even very 
discontinuous functions. This leads to the notion of the Lebesgue inte- 
gral, which we shall spend this chapter and the next constructing. The 
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Lebesgue integral can handle a very large class of functions, including 
all the Riemann integrable functions but also many others as well; in 
fact, it is safe to say that it can integrate virtually any function that one 
actually needs in mathematics, at least if one works on Euclidean spaces 
and everything is absolutely integrable. (If one assumes the axiom of 
choice, then there are still some pathological functions one can construct 
which cannot be integrated by the Lebesgue integral, but these functions 
will not come up in real-life applications.) 

Before we turn to the details, we begin with an informal discussion. 
In order to understand how to compute an integral te f, we must first 
understand a more basic and fundamental question: how does one com- 
pute the length/area/volume of 2? To see why this question is connected 
to that of integration, observe that if one integrates the function 1 on the 
set 2, then one should obtain the length of Q (if is one-dimensional), 
the area of 2 (if is two-dimensional), or the volume of 2 (if Q is three- 
dimensional). To avoid splitting into cases depending on the dimension, 
we shall refer to the measure of 2 as either the length, area, volume, (or 
hypervolume, etc.) of 2, depending on what Euclidean space R” we are 
working in. 

Ideally, to every subset 2 of R” we would like to associate a non- 
negative number m(Q), which will be the measure of () (i.e., the length, 
area, volume, etc.). We allow the possibility for m(Q) to be zero (e.g., if 
Q is just a single point or the empty set) or for m(Q) to be infinite (e.g., if 
Q is all of R”). This measure should obey certain reasonable properties; 
for instance, the measure of the unit cube (0,1)” := {(a1,...,@n) :0< 
x; < 1} should equal 1, we should have m(A U B) = m(A) + m(B) if 
A and B are disjoint, we should have m(A) < m(B) whenever A C B, 
and we should have m(z + A) = m(A) for any x € R” (i.e., if we shift 
A by the vector x the measure should be the same). 

Remarkably, it turns out that such a measure does not exist; one 
cannot assign a non-negative number to every subset of R” which has the 
above properties. This is quite a surprising fact, as it goes against one’s 
intuitive concept of volume; we shall prove it later in these notes. (An 
even more dramatic example of this failure of intuition is the Banach- 
Tarski paradox, in which a unit ball in R? is decomposed into five pieces, 
and then the five pieces are reassembled via translations and rotations to 
form two complete and disjoint unit balls, thus violating any concept of 
conservation of volume; however we will not discuss this paradox here.) 

What these paradoxes mean is that it is impossible to find a reason- 
able way to assign a measure to every single subset of R”. However, we 
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can salvage matters by only measuring a certain class of sets in R” - 
the measurable sets. These are the only sets 2 for which we will define 
the measure m(Q), and once one restricts one’s attention to measurable 
sets, one recovers all the above properties again. Furthermore, almost 
all the sets one encounters in real life are measurable (e.g., all open and 
closed sets will be measurable), and so this turns out to be good enough 
to do analysis. 


7.1 The goal: Lebesgue measure 


Let R” be a Euclidean space. Our goal in this chapter is to define 
a concept of measurable set, which will be a special kind of subset of 
R”, and for every such measurable set 2 C R”, we will define the 
Lebesgue measure m(Q) to be a certain number in [0,00]. The concept 
of measurable set will obey the following properties: 


(i) (Borel property) Every open set in R” is measurable, as is every 
closed set. 


(ii) (Complementarity) If Q is measurable, then R”\2 is also measur- 
able. 


(iii) (Boolean algebra property) If (Q;)je7 is any finite collection of 
measurable sets (so J is finite), then the union U),_, 0; and inter- 
section { ] jes 247 are also measurable. 


je 


(iv) (o-algebra property) If (Q;)jey are any countable collection of 
measurable sets (so J is countable), then the union U),.,;,; and 
intersection (),-, 0; are also measurable. 


jE 
je J 


Note that some of these properties are redundant; for instance, (iv) 
will imply (iii), and once one knows all open sets are measurable, (ii) will 
imply that all closed sets are measurable also. The properties (i-iv) will 
ensure that virtually every set one cares about is measurable; though as 
indicated in the introduction, there do exist non-measurable sets. 

To every measurable set 2, we associate the Lebesgue measure m(Q) 
of 2, which will obey the following properties: 


(v) (Empty set) The empty set @ has measure m(@) = 0. 


(vi) (Positivity) We have 0 < m(Q) < +oco for every measurable set 1. 
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(vii) (Monotonicity) If A C B, and A and B are both measurable, then 
m(A) < m(B). 


(viii) (Finite sub-additivity) If ie jes are a finite collection of measur- 
able sets, then m(Uj<7 Aj) S Vjey MAy)- 


(ix) (Finite additivity) If (A;) jj are a finite collection of disjoint mea- 
surable sets, then m(Ujes Aj) = Wyjey Aj). 


JET 


(x) (Countable sub-additivity) If (A;)je7 are a countable collection of 
measurable sets, then m(Uje7 Aj) S Vjey MAy)- 


(xi) (Countable additivity) If (A;);ey are a countable collection of dis- 
joint measurable sets, then m(Uje7 Aj) = yey MAy)- 


(xii) (Normalization) The unit cube [0, 1)” = {(71,...,¢%n) € R":0< 
xj <1 for all 1 <j <n} has measure m((0, 1]”) = 1. 


(xiii) (Translation invariance) If is a measurable set, and x € R”, then 
r+ := {x+y:y € OD} is also measurable, and m(#+Q) = m(Q). 


Again, many of these properties are redundant; for instance the 
countable additivity property can be used to deduce the finite addi- 
tivity property, which in turn can be used to derive monotonicity (when 
combined with the positivity property). One can also obtain the sub- 
additivity properties from the additivity ones. Note that m(Q) can be 
+oo, and so in particular some of the sums in the above properties may 
also equal +00. (Since everything is positive we will never have to deal 
with indeterminate forms such as —co + +00.) 

Our goal for this chapter can then be stated thus: 


Theorem 7.1.1 (Existence of Lebesgue measure). . There exists a 
concept of a measurable set, and a way to assign a number m(Q) to 
every measurable subset Q C R”, which obeys all of the properties (i)- 
(xiii). 

It turns out that Lebesgue measure is pretty much unique; any other 
concept of measurability and measure which obeys axioms (i)-(xiii) will 
largely coincide with the construction we give. However there are other 
measures which obey only some of the above axioms; also, we may be in- 
terested in concepts of measure for other domains than Euclidean spaces 
R”. This leads to measure theory, which is an entire subject in itself 
and will not be pursued here; however we do remark that the concept 
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of measures is very important in modern probability, and in the finer 
points of analysis (e.g., in the theory of distributions). 


7.2 First attempt: Outer measure 


Before we construct Lebesgue measure, we first discuss a somewhat naive 
approach to finding the measure of a set - namely, we try to cover the 
set by boxes, and then add up the volume of each box. This approach 
will almost work, giving us a concept called outer measure which can be 
applied to every set and obeys all of the properties (v)-(xiii) except for 
the additivity properties (ix), (xi). Later we will have to modify outer 
measure slightly to recover the additivity property. 
We begin by starting with the notion of an open box. 


Definition 7.2.1 (Open box). An open box (or box for short) B in R” 
is any set of the form 


BS [[(a. 6) = 4 (e1)..5a,) € RE? a; € (az, b;) for all 1 << nh, 
i=1 


where b; > a; are real numbers. We define the volume vol(B) of this 
box to be the number 
vol(B) = [[@: = ai) = (by am a1) (b2 = az) ones (bn = cea 


i=1 


For instance, the unit cube (0, 1)” is a box, and has volume 1. In one 
dimension n = 1, boxes are the same as open intervals. One can easily 
check that in general dimension that open boxes are indeed open. Note 
that if we have b; = a; for some 7, then the box becomes empty, and has 
volume 0, but we still consider this to be a box (albeit a rather silly one). 
Sometimes we will use vol,(B) instead of vol(B) to emphasize that we 
are dealing with n-dimensional volume, thus for instance voly(B) would 
be the length of a one-dimensional box B, volg(B) would be the area of 
a two-dimensional box B, etc. 


Remark 7.2.2. We of course expect the measure m(B) of a box to be 
the same as the volume vol(B) of that box. This is in fact an inevitable 
consequence of the axioms (i)-(xiii) (see Exercise 7.2.5). 


Definition 7.2.3 (Covering by boxes). Let Q C R” be a subset of R”. 
We say that a collection (Bj) jez of boxes cover Q iff QO C Uj; Bj. 
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Suppose 2. C R” can be covered by a finite or countable collection 
of boxes (B;)je7. If we wish Q to be measurable, and if we wish to have 
a measure obeying the monotonicity and sub-additivity properties (vii), 
(viii), (x) and if we wish m(B;) = vol(B;) for every box j, then we must 
have 


m(Q) <m | |) By} < S$) m(B;) = S— vol(B;). 


jet jEd jET 


We thus conclude 


m(Q) < inf S| vol(B;) : (Bj) jez covers Q; J at most countable 
jEed 


Inspired by this, we define 


Definition 7.2.4 (Outer measure). If ( is a set, we define the outer 
measure m*(Q) of Q to be the quantity 


qe (Q). = int S| vol(B;) : (Bj) jez covers Q; J at most countable 
jet 


Since }75°, vol(B;) is non-negative, we know that m*(Q) > 0 for all 
Q. However, it is quite possible that m*(Q) could equal +00. Note that 
because we are allowing ourselves to use a countable number of boxes, 
that every subset of R” has at least one countable cover by boxes; in 
fact R” itself can be covered by countably many translates of the unit 
cube (0, 1)” (how?). We will sometimes write m*(Q) instead of m*(Q) 
to emphasize the fact that we are using n-dimensional outer measure. 

Note that outer measure can be defined for every single set (not just 
the measurable ones), because we can take the infimum of any non-empty 
set. It obeys several of the desired properties of a measure: 


Lemma 7.2.5 (Properties of outer measure). Outer measure has the 
following six properties: 


(v) (Empty set) The empty set 0 has outer measure m*() = 0. 


(vi) (Positivity) We have 0 < m*(Q) < +00 for every measurable set 
Q. 
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(vit) (Monotonicity) If AC BCR”, then m*(A) < m*(B). 


(viii) (Finite sub-additivity) If (Aj)jez are a finite collection of subsets 
of R", then m*(Ujes Aj) S Myjem*(Ay)- 


(x) (Countable sub-additivity) If (Aj)jez are a countable collection of 
subsets of R", then m*(Ujes Aj) S Myjey mM (Ay)- 


(xiii) (Translation invariance) If Q is a subset of R", and x € R”, then 
m*(2+Q) = m*(Q). 


Proof. See Exercise 7.2.1. 


The outer measure of a closed box is also what we expect: 


Proposition 7.2.6 (Outer measure of closed box). For any closed 
box 


n 


B= | [lai bi] = { (titan) CR’ £2; 6 [03,0)| for al < 7 =}, 
i=1 
we have 


m*(B) = [[(bi - ai). 


i=1 


Proof. Clearly, we can cover the closed box B = [J/_,[a;, bi] by the open 
box []j_, (ai — €,bi + €) for every « > 0. Thus we have 


m*(B) < vol ( —é,b)+ ) = Tle — aj + 2¢) 


i=1 i=1 


for every ¢ > 0. Taking limits as « > 0, we obtain 
n 
m*(B) < [[(i-ai). 
To finish the proof, we need to show that 


m*(B) > [le — aj). 
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By the definition of m*(B), it suffices to show that 


S| vol(B;) = [[@ => ai) 


je J i=l 


whenever (Bj) jez is a finite or countable cover of B. 

Since B is closed and bounded, it is compact (by the Heine-Borel 
theorem, Theorem 1.5.7), and in particular every open cover has a fi- 
nite subcover (Theorem 1.5.8). Thus to prove the above inequality for 
countable covers, it suffices to do it for finite covers (since if (Bj) jejr is 
a finite subcover of (B;)jey then >7,.; vol(B;) will be greater than or 
equal to >); vol(B;)). 

To summarize, our goal is now to prove that 


(7.1) 


< 
[o) 
pane 
3 
Ss 
=P 
Si 
| 
= 


whenever (B‘)) jeJ is a finite cover of Wy [a;, bj]; we have changed the 
subscript B; to superscript B (7) because we will need the subscripts to 
denote components. 

To prove the inequality (7.1), we shall use induction on the dimension 
n. First we consider the base case n = 1. Here B is just a closed interval 
B = [a,b], and each box BY) is just an open interval BY = (a;,b;). We 


have to show that 
So (bj — aj) = (b- a). 
jet 


To do this we use the Riemann integral. For each j € J, let f¥): ROR 
be the function such that f(x) = 1 when w € (a;,6;) and f(x) =0 
otherwise. Then we have that f“) is Riemann integrable (because it is 
piecewise constant, and compactly supported) and 


i fO =d; — aj. 


Summing this over all 7 € J, and interchanging the integral with the 
finite sum, we have 


ie y= — aj. 


© jes dX) 
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But since the intervals (a;,6;) cover [a,b], we have }) i<j f(x) > 1 for 
all x € [a,b] (why?). For all other values if x, we have }) jj fH (x) > 0. 


Thus 
im we > | er 
[a,b] 


© EJ 


and the claim follows by combining this inequality with the previous 
equality. This proves (7.1) when n = 1. 

Now assume inductively that n > 1, and we have already proven the 
inequality (7.1) for dimensions n — 1. We shall use a similar argument 
to the preceding one. Each box BY) is now of the form 


We can write this as 
BY = AD x (aD, 6M) 
where A is the n — 1-dimensional box AY := [][?3'(a; G) Be), Note 


that 
vol(B)) = vOln—1(A) (bY) — al) 


where we have subscripted vol,_; by n — 1 to emphasize that this is 
n — 1-dimensional volume being referred to here. We similarly write 


B=A-X |@n, by] 
where A := i (a;, bj], and again note that 
vol(B) = voln_1(A)(bn — an). 


For each j € J, let f% be the function such that f(r.) = 
voln—1(A%) for all an € E (al, 1), and f)(x,) = 0 for all other ap. 
Then f) is Riemann integrable and 


/ (OD 976i. fADVOO a= ysi(BO) 
and hence 


S— vol(B' Dee S- fF. 


jet © jes 


7.2. First attempt: Outer measure 171 


Now let xn € [an, bp] and (21,...,%,-1) € A. Then (xj,..., 2%) lies in 
B, and hence lies in one of the BY. Clearly we have rn, € < (al Z pi )), and 
(a1,...,2%n-1) € AY. In particular, we see that for each rp € [an, bn, 
the set 


{AM 2 7 © Tyan € (a, Y)} 


of n — 1-dimensional boxes covers A. Applying the inductive hypothesis 
(7.1) at dimension n — 1 we thus see that 


s voln—1(A%) > voln—1(A), 
jE Sian € (a? bY) 


or in other words 


Sof (Gel Vola Ale 


jEd 
Integrating this over [an, b,], we obtain 
a S- f® > voln—1(A) (bn — an) = vol(B) 
dnibn] je 7 
and in particular 
/ S37 £0 > voln-1(A)(ba — dn) = vol(B) 
~% jes 


since )> jer t (7) is always non-negative. Combining this with our pre- 
vious identity for [> jer F () we obtain (7.1), and the induction is 
complete. 


Once we obtain the measure of a closed box, the corresponding result 
for an open box is easy: 


Corollary 7.2.7. For any open box 


n 


B= [[(a. 4) = {(%1,...,¢n) € R” : a € (ai, b;) for all 1 <i <n}, 
i=l 
we have 7 
m*(B) = [[(bi - ai). 
i=l 


In particular, outer measure obeys the normalization (xii). 
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Proof. We may assume that }; > a; for all 7, since if b; = a; this follows 
from Lemma 7.2.5(v). Now observe that 


n n n 


[[le +e,b; — E| C [[. b;) C [ [le bj] 


i=1 i=1 i=1 


for all « > 0, assuming that ¢ is small enough that 6; — ¢ > a; +e for all 
i. Applying Proposition 7.2.6 and Lemma 7.2.5(vii) we obtain 


[]t:— a: — 26) < m* (Ih Qi, W)) < <TI — aj). 
t=1 i=1 i=1 


Sending ¢ — 0 and using the squeeze test (Corollary 6.4.14), one obtains 
the result. 


We now compute some examples of outer measure on the real line 
R. 


Example 7.2.8. Let us compute the one-dimensional measure of R. 
Since (—R, R) C R for all R > 0, we have 


m*(R) = m*((—R, R)) = 2R 
by Corollary 7.2.7. Letting R — +co we thus see that m*(R) = +00. 


Example 7.2.9. Now let us compute the one-dimensional measure of 
Q. From Proposition 7.2.6 we see that for each rational number Q, the 
point {q} has outer measure m*({q}) = 0. Since Q is clearly the union 
Q=U gcQid} Of all these rational points g, and Q is countable, we have 


Q) < So m*({q}) = 0 =0, 


qeQ qeEQ 


and so m*(Q) must equal zero. In fact, the same argument shows that 
every countable set has measure zero. (This, incidentally, gives another 
proof that the real numbers are uncountable, Corollary 8.3.4.) 


Remark 7.2.10. One consequence of the fact that m*(Q) = 0 is that 
given any € > 0, it is possible to cover the rationals Q by a countable 
number of intervals whose total length is less than ¢. This fact is some- 
what un-intuitive; can you find a more explicit way to construct such a 
countable covering of Q by short intervals? 
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Example 7.2.11. Now let us compute the one-dimensional measure of 
the irrationals R\Q. From finite sub-additivity we have 


m*(R) < m*(R\Q) + m*(Q). 


Since Q has outer measure 0, and m*(R) has outer measure +00, we 
thus see that the irrationals R\Q have outer measure +oo. A similar ar- 
gument shows that [0, 1]\Q, the irrationals in [0,1], have outer measure 
1 (why?). 


Example 7.2.12. By Proposition 7.2.6, the unit interval [0,1] in R has 
one-dimensional outer measure 1, but the unit interval {(z,0):0<a< 
1} in R? has two-dimensional outer measure 0. Thus one-dimensional 
outer measure and two-dimensional outer measure are quite different. 
Note that the above remarks and countable additivity imply that the 
entire z-axis of R? has two-dimensional outer measure 0, despite the 
fact that R has infinite one-dimensional measure. 


— Exercises — 


Exercise 7.2.1. Prove Lemma 7.2.5. (Hint: you will have to use the definition 
of inf, and probably introduce a parameter ¢. You may have to treat separately 
the cases when certain outer measures are equal to +oo. (viii) can be deduced 
from (x) and (v). For (x), label the index set J as J = {j1, jo, j3,-..}, and for 
each A;, pick a covering of A; by boxes whose total volume is no larger than 
m*(A;) + e/2).) 


Exercise 7.2.2. Let A be a subset of R”, and let B be a subset of R™. Note that 
the Cartesian product {(a,b) : a € A,b € B} is then a subset of R"*™. Show 
that mr 4,(A x B) < m*(A)m*,(B). (It is in fact true that mp 4,,(A x B) = 
m*(A)m*,(B), but this is substantially harder to prove). 


n ee 

In Exercises 7.2.3-7.2.5, we assume that R” is a Euclidean space, and we 
have a notion of measurable set in R” (which may or may not coincide with 
the notion of Lebesgue measurable set) and a notion of measure (which may 


or may not co-incide with Lebesgue measure) which obeys axioms (i)-(xiii). 


Exercise 7.2.3. 


(a) Show that if Ay C Ap C Ag... is an increasing sequence of measur- 
able sets (so A; C Aj+1 for every positive integer j), then we have 
co . 
mUj=1 A;) = limjo0 m(Aj). 


(b) Show that if Ay D Ag D Az... is a decreasing sequence of measurable 
sets (so A; D Aj;+, for every positive integer j), and m(A;) < +00, then 
we have m((\7, Aj) = limjoo m(Aj). 
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Exercise 7.2.4. Show that for any positive integer q > 1, that the open box 
(0, 1/q)" := {(m1,...,2n) ER” :0< a; <1/q for alll <j <n} 
and the closed box 
[OSL fg)? 2= frig ss 4 ty) CRA OS oe < 1/¢ forall 1 7 nf 


both measure q~”. (Hint: first show that m((0,1/q¢)") < q~” for every q¢ > 1 
by covering (0,1)” by some translates of (0,1/q)". Using a similar argument, 
show that m([0,1/q]") > q7~”". Then show that m([0,1/q]"\(0,1/q)") < e for 
every € > 0, by covering the boundary of [0, 1/q]” with some very small boxes.) 
Exercise 7.2.5. Show that for any box B, that m(B) = vol(B). (Hint: first 
prove this when the co-ordinates a;, b; are rational, using Exercise 7.2.4. Then 
take limits somehow (perhaps using Q1) to obtain the general case when the 
co-ordinates are real.) 

Exercise 7.2.6. Use Lemma 7.2.5 and Proposition 7.2.6 to furnish another proof 
that the reals are uncountable (i.e., reprove Corollary 8.3.4). 


7.3 Outer measure is not additive 


In light of Lemma 7.2.5, it would seem now that all we need to do is 
to verify the additivity properties (ix), (xi), and we have everything we 
need to have a usable measure. Unfortunately, these properties fail for 
outer measure, even in one dimension. 


Proposition 7.3.1 (Failure of countable additivity). There exists 
a countable collection (Aj;)jez of disjoint subsets of R, such that 


m* (jer Aj) # jes m*(Aj). 


Proof. We shall need some notation. Let Q be the rationals, and R be 
the reals. We say that a set A C R is a coset of Q if it is of the form 
A =2+Q for some real number x. For instance, /2+Q is a coset of R, 
as is Q itself, since Q = 0+ Q. Note that a coset A can correspond to 
several values of x; for instance 2+ Q is exactly the same coset as 0+ Q. 
Also observe that it is not possible for two cosets to partially overlap; 
if « + Q intersects y + Q in even just a single point z, then x — y must 
be rational (why? use the identity x — y = (a — z) — (y — z)), and thus 
x +Q and y+ Q must be equal (why?). So any two cosets are either 
identical or distinct. 

We observe that every coset A of the rationals R has a non-empty 
intersection with [0,1]. Indeed, if A is a coset, then A = x + Q for some 
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real number x. If we then pick a rational number gq in [—z, 1 — 2] then 
we see that x + q € [0,1], and thus AN [0,1] contains x + q. 

Let R/Q denote the set of all cosets of Q; note that this is a set 
whose elements are themselves sets (of real numbers). For each coset A 
in R/Q, let us pick an element x4 of ANM[(0,1]. (This requires us to make 
an infinite number of choices, and thus requires the axiom of choice, see 
Section 8.4.) Let E be the set of all such vy, ie., FE := {x4: A € R/Q}. 
Note that EF C [0,1] by constrution. 

Now consider the set 


.—— U (q+E£). 


geQr[-1,1] 


Clearly this set is contained in [—1,2] (since gq +a € [—1,2] whenever 
q € [-1,1] and x € FE C (0,1]). We claim that this set contains the 
interval [0,1]. Indeed, for any y € [0,1], we know that y must belong 
to some coset A (for instance, it belongs to the coset y + Q). But we 
also have x4 belonging to the same coset, and thus y — x, is equal to 
some rational g. Since y and x, both live in [0,1], then q lives in [—1, 1]. 
Since y=q+2,, we have y€ q+ E, and hence y € X as desired. 
We claim that 


m(X)# Sl mi(q+e), 
qeQn{-1,1] 
which would prove the claim. To see why this is true, observe that since 
[0,1] C X C [1,2], that we have 1 < m*(X) < 3 by monotonicity 
and Proposition 7.2.6. For the right hand side, observe from translation 
invariance that 


S> matF)= YS) me). 
qeQn[-1,1] qeQnr[-1,1] 


The set Q/N [—1,1] is countably infinite (why?). Thus the right-hand 
side is either 0 (if m*(E) = 0) or +00 (if m*(E) > 0). Either way, it 
cannot be between 1 and 3, and the claim follows. 


Remark 7.3.2. The above proof used the axiom of choice. This turns 
out to be absolutely necessary; one can prove using some advanced tech- 
niques in mathematical logic that if one does not assume the axiom of 
choice, then it is possible to have a mathematical model where outer 
measure is countably additive. 


176 7. Lebesgue measure 


One can refine the above argument, and show in fact that m* is not 
finitely additive either: 


Proposition 7.3.3 (Failure of finite additivity). There exists a finite 
collection (Aj)jey of disjoint subsets of R, such that 


m* | J Ay | 4 > m*(Ay). 
ged jEd 
Proof. This is accomplished by an indirect argument. Suppose for sake 
of contradiction that m* was finitely additive. Let FE and X be the 
sets introduced in Proposition 7.3.1. From countable sub-additivity and 
translation invariance we have 


m(X)< So mqt+E)= SY) mE). 
geQn[-1,1] qe Q){-1,1] 
Since we know that 1 < m*(X) < 3, we thus have m*(E) 4 0, since 
otherwise we would have m*(X) < 0, a contradiction. 
Since m*(E) # 0, there exists a finite integer n > 0 such that 
m*(E) > 1/n. Now let J be a finite subset of QM [—1, 1] of cardinality 
3n. If m* were finitely additive, then we would have 


m* SoqtE =) om*(q+ E) = 5 > m*(£) > 3n— = 8. 


nr 
qed qed qed 


But we know that }),¢)q+H# is a subset of X, which has outer measure 
at most 3. This contradicts monotonicity. Hence m* cannot be finitely 
additive. 


Remark 7.3.4. The examples here are related to the Banach-Tarski 
paradox, which demonstrates (using the axiom of choice) that one can 
partition the unit ball in R® into a finite number of pieces which, when 
rotated and translated, can be reassembled to form two complete unit 
balls! Of course, this partition involves non-measurable sets. We will 
not present this paradox here as it requires some group theory which is 
beyond the scope of this text. 


7.4 Measurable sets 


In the previous section we saw that certain sets were badly behaved with 
respect to outer measure, in particular they could be used to contradict 
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finite or countable additivity. However, those sets were rather patho- 
logical, being constructed using the axiom of choice and looking rather 
artificial. One would hope to be able to exclude them and then somehow 
recover finite and countable additivity. Fortunately, this can be done, 
thanks to a clever definition of Constantin Carathéodory (1873-1950): 


Definition 7.4.1 (Lebesgue measurability). Let E be a subset of R”. 
We say that E is Lebesgue measurable, or measurable for short, iff we 
have the identity 


m*(A) = m*(AN E) + m*(A\E) 


for every subset A of R”. If E is measurable, we define the Lebesgue 
measure of E to be m(F) = m*(F); if E is not measurable, we leave 
m(£) undefined. 


In other words, F being measurable means that if we use the set E 
to divide up an arbitrary set A into two parts, we keep the additivity 
property. Of course, if m* were finitely additive then every set EF would 
be measurable; but we know from Proposition 7.3.3 that not every set 
is finitely additive. One can think of the measurable sets as the sets for 
which finite additivity works. We sometimes subscript m(E) as m,(E) 
to emphasize the fact that we are using n-dimensional Lebesgue measure. 

The above definition is somewhat hard to work with, and in practice 
one does not verify a set is measurable directly from this definition. 
Instead, we will use this definition to prove various useful properties of 
measurable sets (Lemmas 7.4.2-7.4.11), and after that we will rely more 
or less exclusively on the properties in those lemmas, and no longer need 
to refer to the above definition. 

We begin by showing that a large number of sets are indeed mea- 
surable. The empty set E = ( and the whole space E = R” are clearly 
measurable (why?). Here is another example of a measurable set: 


Lemma 7.4.2 (Half-spaces are measurable). The half-space 
{(£1,.--,%n) € R”: ay > OF 
is measurable. 


Proof. See Exercise 7.4.3. 


Remark 7.4.3. A similar argument will also show that any half-space 
of the form -{(71;...j;%,) ER” 2a; > 0} or f@i,s.50,) € Ra; <0} 
for some 1 < 7 < n is measurable. 
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Now for some more properties of measurable sets. 


Lemma 7.4.4 (Properties of measurable sets). 


(a) If E is measurable, then R"\E is also measurable. 


(b) (Translation invariance) If FE is measurable, and x € R”, then 
x+ E is also measurable, and m(a + FE) = m(E). 


(c) If EF, and Ey are measurable, then Ey 1 Eg and E; U Ez are mea- 
surable. 


(d) (Boolean algebra property) If EF, E2,...,EN are measurable, then 
es E; and ane E; are measurable. 


(e) Every open box, and every closed box, is measurable. 


(f) Any set E of outer measure zero (i.e., m*(E) = 0) is measurable. 


Proof. See Exercise 7.4.4. 


From Lemma 7.4.4, we have proven properties (ii), (iii), (xiii) on our 
wish list of measurable sets, and we are making progress towards (i). 
We also have finite additivity (property (ix) on our wish list): 


Lemma 7.4.5 (Finite additivity). If (E;)jey are a finite collection of 
disjoint measurable sets and any set A (not necessarily measurable), we 
have 


m* ANU =) > m"(AN B)). 
jet jEd 


Furthermore, we have m(Uje7 Ej) = jes ME): 


Proof. See Exercise 7.4.6. 


Remark 7.4.6. Lemma 7.4.5 and Proposition 7.3.3, when combined, 
imply that there exist non-measurable sets: see Exercise 7.4.5. 


Corollary 7.4.7. If A C B are two measurable sets, then B\A is also 
measurable, and 
m(B\A) = m(B) — m(A). 


Proof. See Exercise 7.4.7. 


Now we show countable additivity. 
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Lemma 7.4.8 (Countable additivity). [f (Ej)je7 are a countable col- 
lection of disjoint measurable sets, then Uses Ej; is measurable, and 


MUjes Ey) = Vijeg MEj)- 


Proof. Let E := Uje, Ej. Our first task will be to show that E is mea- 
surable. Thus, let A be an arbitrary set (not necessarily measurable); 
we need to show that 


m*(A) = m*(AN E)+m*(A\E). 
Since J is countable, we may write J = {j1, J2,73,...}. Note that 
ANE=|J(Ank;,) 
k=1 


(why?) and hence by countable sub-additivity 
m*(AN EB) < 5> m*(An Ej,). 
We rewrite this as 


m*(AN E) < sup So m*(A (NT) 
Wel 
Let Fy be the set Fy := Wes E;,. Since the AN Fj, are all disjoint, 
and their union is AN Fy, we see from Lemma 7.4.5 that 
N 
S$" m*(AN Bj) =m*(An Fy) 
k=1 


and hence 
m*(AN E) < sup m*(AN Fy). 


N>1 
Now we look at A\E. Since Fy C E (why?), we have A\E C A\Fw 
(why?). By monotonicity, we thus have 
m*(A\E) < m*(A\Fw) 


for all N. In particular, we see that 


m*(AN BE) +m*(A\E) < es m*(AN Fy) +m*(A\E) 
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< sup m*(AN Fy) + m*(A\F). 
N>1 


But from Lemma 7.4.5 we know that Fy is measurable, and hence 
m*(AN Fy) + m*(A\Fn) = m*(A). 
Putting this all together we obtain 
m*(AN EB) + m*(A\E) < m*(A). 
But from finite sub-additivity we have 
m* (AN £E)+m*(A\E) > m*(A) 


and the claim follows. This shows that FE is measurable. 
To finish the lemma, we need to show that m(E) is equal to 
jes M(E;). We first observe from countable sub-additivity that 


as desired. 


This proves property (xi) on our wish list. Next, we do countable 
unions and intersections. 


Lemma 7.4.9 (a-algebra property). If (Q;)jey7 are any countable col- 
lection of measurable sets (so J is countable), then the union je; Q; 
and the intersection jes Q; are also measurable. 
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Proof. See Exercise 7.4.8. 


The final property left to verify on our wish list is (a). We first need 
a preliminary lemma. 


Lemma 7.4.10. Every open set can be written as a countable or finite 
union of open boxes. 


Proof. We first need some notation. Call a box B = [[}_, (ai, b;) rational 
if all of its components a;, b; are rational numbers. Observe that there are 
only a countable number of rational boxes (this is since a rational box 
is described by 2n rational numbers, and so has the same cardinality 
as Q?”. But Q is countable, and the Cartesian product of any finite 
number of countable sets is countable; see Corollaries 8.1.14, 8.1.15). 

We make the following claim: given any open ball B(z,r), there 
exists a rational box B which is contained in B(x,r) and which contains 
x. To prove this claim, write x = (#1,...,2%,). For each 1 <i < n, let 
a; and b; be rational numbers such that 


r r 
L--— <a < aj <b <aj+—. 
n nr 


Then it is clear that the box []/_, (ai, 6;) is rational and contains x. A 
simple computation using Pythagoras’ theorem (or the triangle inequal- 
ity) also shows that this box is contained in B(«,1r); we leave this to the 
reader. 

Now let E be an open set, and let © be the set of all rational boxes 
B which are subsets of FE, and consider the union Upes B of all those 
boxes. Clearly, this union is contained in E, since every box in © is 
contained in & by construction. On the other hand, since E is open, we 
see that for every x € F there is a ball B(x,r) contained in E, and by 
the previous claim this ball contains a rational box which contains x. In 
particular, x is contained in Upces B. Thus we have 


E= J B 
Bex 


as desired; note that © is countable or finite because it is a subset of the 
set of all rational boxes, which is countable. 


Lemma 7.4.11 (Borel property). Every open set, and every closed set, 
is Lebesgue measurable. 
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Proof. It suffices to do this for open sets, since the claim for closed sets 
then follows by Lemma 7.4.4(a) (i.e., property (ii)). Let E be an open 
set. By Lemma 7.4.10, E is the countable union of boxes. Since we 
already know that boxes are measurable, and that the countable union 
of measurable sets is measurable, the claim follows. 


The construction of Lebesgue measure and its basic properties are 
now complete. Now we make the next step in constructing the Lebesgue 
integral - describing the class of functions we can integrate. 


— Exercises — 


Exercise 7.4.1. If A is an open interval in R, show that m*(A) = m*(AN 
(0, 0) + m*(A\(0, 00)). 


Exercise 7.4.2. If A is an open box in R”, and F is the half-plane FE := 
{(v1,---,%n) € R” : tp, > O}, show that m*(A) = m*(AN E) 4+ m*(A\E). 
(Hint: use Exercise 7.4.1.) 

Exercise 7.4.3. Prove Lemma 7.4.2. (Hint: use Exercise 7.4.2.) 


Exercise 7.4.4. Prove Lemma 7.4.4. (Hints: for (c), first prove that 
m*(A) = m*(ANE\NE2)+m*(ANE)\ E2)+m* (AN E2\E1)+m* (A\(£,VUE2)). 


A Venn diagram may be helpful. Also you may need the finite sub-additivity 
property. Use (c) to prove (d), and use (bd) and the various versions of Lemma 
7.4.2 to prove (e)). 


Exercise 7.4.5. Show that the set F used in the proof of Propositions 7.3.1 and 
7.3.3 is non-measurable. 


Exercise 7.4.6. Prove Lemma 7.4.5. 
Exercise 7.4.7. Use Lemma 7.4.5 to prove Corollary 7.4.7. 


Exercise 7.4.8. Prove Lemma 7.4.9. (Hint: for the countable union problem, 
write J = {j1, jo,...}, write Fy := Wes Q,,, and write Ey := Fy\Fn-1, with 
the understanding that Fo is the empty set. Then apply Lemma 7.4.8. For the 
countable intersection problem, use what you just did and Lemma 7.4.4(a).) 


Exercise 7.4.9. Let A C R? be the set A := [0,1]?\Q?; ie A consists of all 
the points (x,y) in [0,1]? such that x and y are not both rational. Show that 
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A is measurable and m(A) = 1, but that A has no interior points. (Hint: it’s 
easier to use the properties of outer measure and measure, including those in 
the exercises above, than to try to do this problem from first principles.) 


Exercise 7.4.10. Let A C B C R”. Show that if B is Lebesgue measurable 
with measure zero, then A is also Lebesgue measurable with measure zero. 


7.5 Measurable functions 


In the theory of the Riemann integral, we are only able to integrate a 
certain class of functions - the Riemann integrable functions. We will 
now be able to integrate a much larger range of functions - the measur- 
able functions. More precisely, we can only integrate those measurable 
functions which are absolutely integrable - but more on that later. 


Definition 7.5.1 (Measurable functions). Let 2 be a measurable subset 
of R”, and let f : Q > R™ be a function. A function f is measurable 
iff f-'(V) is measurable for every open set V C R™. 


As discussed earlier, most sets that we deal with in real life are 
measurable, so it is only natural to learn that most functions we deal 
with in real life are also measurable. For instance, continuous functions 
are automatically measurable: 


Lemma 7.5.2 (Continuous functions are measurable). Let 2. be a mea- 
surable subset of R”, and let f : 2 > R™ be continuous. Then f is also 
measurable. 


Proof. Let V be any open subset of R™. Then since f is continuous, 
f—'(V) is open relative to Q (see Theorem 2.1.5(c)), ie., f-'(V) = WNQ 
for some open set W C R” (see Proposition 1.3.4(a)). Since W is open, 
it is measurable; since Q is measurable, W M1 is also measurable. 


Because of Lemma 7.4.10, we have an easy criterion to test whether 
a function is measurable or not: 


Lemma 7.5.3. Let Q be a measurable subset of R”, and let f : Q > R™ 
be a function. Then f is measurable if and only if f~'(B) is measurable 
for every open box B. 


Proof. See Exercise 7.5.1. 
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Corollary 7.5.4. Let Q be a measurable subset of R”, and let f:Q—7> 
R”™ be a function. Suppose that f = (fi,..-, fm), where fj :Q—> R is 
the j*” co-ordinate of f. Then f is measurable if and only if all of the 
fj are individually measurable. 


Proof. See Exercise 7.5.2. 


Unfortunately, it is not true that the composition of two measur- 
able functions is automatically measurable; however we can do the next 
best thing: a continuous function applied to a measurable function is 
measurable. 


Lemma 7.5.5. Let Q be a measurable subset of R”, and let W be an 
open subset of R™. If f :Q 4 W is measurable, and g: W — RP is 
continuous, then go f : Q— RP? is measurable. 


Proof. See Exercise 7.5.3. 


This has an immediate corollary: 


Corollary 7.5.6. Let Q be a measurable subset of R”. If f:Q—7R is 
a measurable function, then so is |f|, max(f,0), and min(f,0). 


Proof. Apply Lemma 7.5.5 with g(x) := |a|, g(x) := max(z,0), and 
g(x) := min(z, 0). 


A slightly less immediate corollary: 


Corollary 7.5.7. Let Q be a measurable subset of R”. If f: A727 R 
and g: (1 R are measurable functions, then so is f +g, f—g, fg, 
max(f,g), and min(f,g). If g(x) #4 0 for all x € Q, then f/g is also 


measurable. 


Proof. Consider f + g. We can write this as koh, where h : Q + R? 
is the function h(z) = (f(x),g(x)), and k : R? > R is the function 
k(a,b) := a+b. Since f,g are measurable, then h is also measurable by 
Corollary 7.5.4. Since k is continuous, we thus see from Lemma 7.5.5 
that ko h is measurable, as desired. A similar argument deals with all 
the other cases; the only thing concerning the f/g case is that the space 
R? must be replaced with {(a,b) € R? : b 4 0} in order to keep the map 
(a, b) + a/b continuous and well-defined. 
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Another characterization of measurable functions is given by 


Lemma 7.5.8. Let Q be a measurable subset of R”, and let f :Q—> 
R be a function. Then f is measurable if and only if f~!((a,oo)) is 
measurable for every real number a. 


Proof. See Exercise 7.5.4. 


Inspired by this lemma, we extend the notion of a measurable func- 
tion to the extended real number system R* := RU {+00} U {—oo}: 


Definition 7.5.9 (Measurable functions in the extended reals). Let Q 
be a measurable subset of R”. A function f : Q — R%* is said to be 
measurable iff f—'((a,-+oo]) is measurable for every real number a. 


Note that Lemma 7.5.8 ensures that the notion of measurability for 
functions taking values in the extended reals R* is compatible with that 
for functions taking values in just the reals R. 

Measurability behaves well with respect to limits: 


Lemma 7.5.10 (Limits of measurable functions are measurable). Let Q 
be a measurable subset of R". For each positive integer n, let fn: QQ 
R* be a measurable function. Then the functions sup, 1 fn, Mfn>1 fn, 
lim sup,,-559 fn, and liminfy oo fn are also measurable. In particular, if 
the fn converge pointwise to another function f :Q > R*, then f is 
also measurable. 


Proof. We first prove the claim about sup, >; fn. Call this function g. 
We have to prove that g~!((a, +oo]) is measurable for every a. But by 
the definition of supremum, we have 


g *((a,+o0]) = ies ((a, +o0]) 
n>1 


(why?), and the claim follows since the countable union of measurable 
sets is again measurable. 

A similar argument works for infy>1 fn. The claim for lim sup and 
lim inf then follow from the identities 


lim sup fp = inf, ies tn 
noo 
and 


lim ant fn = sup inf Tin 
N>1in2 


(see Definition 6.4.6). 
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As you can see, just about anything one does to a measurable func- 
tion will produce another measurable function. This is basically why 
almost every function one deals with in mathematics is measurable. (In- 
deed, the only way to construct non-measurable functions is via artificial 
means such as invoking the axiom of choice.) 


— Exercises — 
Exercise 7.5.1. Prove Lemma 7.5.3. (Hint: use Lemma 7.4.10 and the o-algebra 
property.) 
Exercise 7.5.2. Use Lemma 7.5.3 to deduce Corollary 7.5.4. 
Exercise 7.5.3. Prove Lemma. 7.5.5. 


Exercise 7.5.4. Prove Lemma 7.5.8. (Hint: use Lemma 7.5.3. As a preliminary 
step, you may need to show that if f~1((a,oo)) is measurable for all a, then 
f~1*({a, co)) is also measurable for all a.) 


Exercise 7.5.5. Let f : R” — R be Lebesgue measurable, and let g: R” — R 
be a function which agrees with f outside of a set of measure zero, thus there 
exists a set A C R” of measure zero such that f(x) = g(x) for all x € R”\A. 
Show that g is also Lebesgue measurable. (Hint: use Exercise 7.4.10.) 


Chapter 8 


Lebesgue integration 


In Chapter 11, we approached the Riemann integral by first integrating 
a particularly simple class of functions, namely the piecewise constant 
functions. Among other things, piecewise constant functions only attain 
a finite number of values (as opposed to most functions in real life, 
which can take an infinite number of values). Once one learns how 
to integrate piecewise constant functions, one can then integrate other 
Riemann integrable functions by a similar procedure. 

We shall use a similar philosophy to construct the Lebesgue inte- 
gral. We shall begin by considering a special subclass of measurable 
functions - the simple functions. Then we will show how to integrate 
simple functions, and then from there we will integrate all measurable 
functions (or at least the absolutely integrable ones). 


8.1 Simple functions 


Definition 8.1.1 (Simple functions). Let 2 be a measurable subset of 
R”, and let f : Q > R be a measurable function. We say that f is a 
simple function if the image f(Q) is finite. In other words, there exists 
a finite number of real numbers cj, c2,...,cn such that for every « € Q, 
we have f(x) = c; for some 1 <j < N. 


Example 8.1.2. Let Q be a measurable subset of R”, and let E be 
a measurable subset of 2. We define the characteristic function yr : 
Q > R by setting ye(x) :-= 1 if « € E, and yg(z) := Oif a ¢ LE. 
(In some texts, xg is also written 1,, and is referred to as an indicator 
function). Then xg is a measurable function (why?), and is a simple 
function, because the image xg(Q2) is {0,1} (or {0} if E is empty, or {1} 
ity = 0)), 

© Springer Science+Business Media Singapore 2016 and Hindustan Book Agency 2015 187 


T. Tao, Analysis II, Texts and Readings in Mathematics 38, 
DOI 10.1007/978-981-10-1804-6_8 


188 &. Lebesgue integration 


We remark on three basic properties of simple functions: that they 
form a vector space, that they are linear combinations of characteris- 
tic functions, and that they approximate measurable functions. More 
precisely, we have the following three lemmas: 


Lemma 8.1.3. Let Q be a measurable subset of R”, and let f: 23 R 
and g:Q2— R be simple functions. Then f+g is also a simple function. 
Also, for any scalar c € R, the function cf is also a simple function. 


Proof. See Exercise 8.1.1. 


Lemma 8.1.4. Let Q be a measurable subset of R”, and let f :Q—> 
R be a simple function. Then there exists a finite number of real 


numbers c,,...,cen, and a finite number of disjoint measurable sets 
Ey, Eo,...,En inQ, such that f = ey CCE: 


Proof. See Exercise 8.1.2. 


Lemma 8.1.5. Let Q) be a measurable subset of R”, and let f: 237 R 
be a measurable function. Suppose that f is always non-negative, i.e., 
f(x) > 0 for all x € Q. Then there exists a sequence fi, fo, f3,... of 
simple functions, fy : Q—> R, such that the fy are non-negative and 
increasing, 


0< fi(xz) < fo(x) < fe(x) <... for alla EQ 


and converge pointwise to f: 


lim fn(x) = f(x) for allx €. 


Noo 


Proof. See Exercise 8.1.3. 


We now show how to compute the integral of simple functions. 


Definition 8.1.6 (Lebesgue integral of simple functions). Let Q be a 
measurable subset of R”, and let f :Q— R be a simple function which 
is non-negative; thus f is measurable and the image f({2) is finite and 
contained in [0,00). We then define the Lebesgue integral te f of f on 
Q by 


[fr = S° Am({a € OQ: f(x) = A}). 


AE f(Q);A>0 
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We will also sometimes write fe f as fe f dm (to emphasize the 
role of Lebesgue measure m) or use a dummy variable such as 2, e.g., 


Jo d@)-da: 


Example 8.1.7. Let f: R— R be the function which equals 3 on the 
interval [1,2], equals 4 on the integral (2, 4), and is zero everywhere else. 
Then 


[ f= 3x m(l1,2) +4 x m(Q,4) =3x144x2= 1 
Q 


Or if g: R > R is the function which equals 1 on [0,00) and is zero 
everywhere else, then 


[ 9 =1 m(l0,00)) = 1x +00 = +00. 
Q 


Thus the simple integral of a simple function can equal +00. (The reason 
why we restrict this integral to non-negative functions is to avoid ever 
encountering the indefinite form +-oo + (—oo)). 


Remark 8.1.8. Note that this definition of integral corresponds to one’s 
intuitive notion of integration (at least of non-negative functions) as the 
area under the graph of the function (or volume, if one is in higher 
dimensions). 


Another formulation of the integral for non-negative simple functions 
is as follows. 


Lemma 8.1.9. Let Q be a measurable subset of R”, and let Fy,..., En 
are a finite number of disjoint measurable subsets in Q. Let c,...,¢N 
be non-negative numbers (not necessarily distinct). Then we have 


N n 
So cix,; = S° cym(Bj). 
Q 5=1 j=l 


Proof. We can assume that none of the c; are zero, since we can just 
remove them from the sum on both sides of the equation. Let f := 
al cjXB;- Then f(x) is either equal to one of the c; (if x € Ej) or 
equal to 0 (if « ¢ One E;). Thus f is a simple function, and f(Q) C 
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{O}U{c;:1<j7 < N}. Thus, by the definition, 


[t= XS  amite eo: f@) =) 


NEL cp: <jJ<N} 
= S> d»Am LU &). 
DEL Cj :1<J<N} 1<j<Nicj=r 
But by the finite additivity property of Lebesgue measure, this is equal 


to 
Sy A SD mE) 


NE{cj:1<J<N}  1SF<Niej=r 
) ) cy). 
NE {cj :1<J<N} 1SJSNicj=r 


Each j appears exactly once in this sum, since c; is only equal to exactly 
one value of A. So the above expression is equal to )))<j<y cjm(Ej) as 
desired. aid 


Some basic properties of Lebesgue integration of non-negative simple 
functions: 


Proposition 8.1.10. Let Q be a measurable set, and let f :Q > R and 
g:2—4R be non-negative simple functions. 


(a) We have 0 < J, f < 00. Furthermore, we have J, f = 0 if and 
only if m({x €Q: f(x) 4 0}) =0. 


(b) We have fo(f +9) = fof + Jog: 
(c) For any positive number c, we have Jo cf =c¢ Jo f.- 


(d) If f(x) < g(a) for all x € Q, then we have fo f < Jog. 


We make a very convenient notational convention: if a property P(x) 
holds for all points in 0, except for a set of measure zero, then we say 
that P holds for almost every point in 2. Thus (a) asserts that ips f=0 
if and only if f is zero for almost every point in 2. 


Proof. From Lemma 8.1.4 or from the formula 


f= ye AX {weO:f(x)=d} 
AE f(Q)\{O} 
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we can write f as a combination of characteristic functions, say 


N 
f = iS CIXE;) 
j=l 


where Fj,..., Ey are disjoint subsets of Q and the c; are positive. Sim- 
ilarly we can write 


M 
9= > 4 XR, 
k=l 


where F,,..., Fig are disjoint subsets of 2 and the dy are positive. 


(a) Since fy f = ae cjm(;) it is clear that the integral is between 
0 and infinity. If f is zero almost everywhere, then all of the Ej 
must have measure zero (why?) and so f{, f = 0. Conversely, if 


Jo f = 0, then ele cjm(£;) = 0, which can only happen when 


all of the m(£;) are zero (since all the c; are positive). But then 


N ; 
U ;=1 £j has measure zero, and hence f is zero almost everywhere 
in Q. 


(b) Write Ep := Q\ Weer E; and co := 0, then we have Q = Eo U Ey U 
... UEn and 


N 
ie Ds CIXE;: 
j=0 


Similarly if we write Fo := Q\ Lees F,, and do := 0 then 


M 
9= >) dx: 
k=0 


Since QQ = ByU... UEy = FouU...U Fy, we have 


N M 
f= 53 l Gxenm, 


and 
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and hence 


frg= = (cj + dk) XBjNF.- 
O<j<NO<k<M 


By Lemma 8.1.9, we thus have 


[ (fta= SD (jp tdem(Bj NFR). 


O<j<N;j0Sk<M 


On the other hand, we have 


f — S- cjm(E;) — Ss" onal; ia Fy) 


OSjSN O<jSNOSkSM 


and similarly 


[s = So dym(Fy) = So dam(Bj 0 Fr) 


0<k<M 0<j<Nj0<k<M 
and the claim (b) follows. 


(c) Since cf = es cc;Xp,, we have focf = ee ccjm(E;). Since 
eae eee cjm(£;), the claim follows. 


(d) Write h := g—f. Then h is simple and non-negative and g = f+h, 
hence by (b) we have Jog = Jo f + Jgh- But by (a) we have 
JQ h = 0, and the claim follows. 


— Exercises — 
Exercise 8.1.1. Prove Lemma 8.1.3. 
Exercise 8.1.2. Prove Lemma 8.1.4. 
Exercise 8.1.3. Prove Lemma 8.1.5. (Hint: set 
fa(#) = sup{ >: j € Z, 5 < min(f(2),2")}, 


ie., fr(a) is the greatest integer multiple of 2~” which does not exceed either 
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f(x) or 2”. You may wish to draw a picture to see how fi, fo, fs, etc. works. 
Then prove that f,, obeys all the required properties.) 


8.2 Integration of non-negative measurable functions 


We now pass from the integration of non-negative simple functions to 
the integration of non-negative measurable functions. We will allow our 
measurable functions to take the value of +coo sometimes. 


Definition 8.2.1 (Majorization). Let f :Q— Rand g:2-— R be 
functions. We say that f majorizes g, or g minorizes f, iff we have 
f(x) > g(x) for alla € 2. 


We sometimes use the phrase “f dominates g” instead of “f ma- 
jorizes g”. 


Definition 8.2.2 (Lebesgue integral for non-negative functions). Let Q 
be a measurable subset of R”, and let f : Q — [0,co] be measurable 
and non-negative. Then we define the Lebesgue integral [, f of f on Q 
to be 


| f := sup { i $s: 8 is simple and non-negative, and minorizes f } ; 
Q Q 


Remark 8.2.3. The reader should compare this notion to that of a 
lower Riemann integral from Definition 11.3.2. Interestingly, we will not 
need to match this lower integral with an upper integral here. 


Remark 8.2.4. Note that if 9’ is any measurable subset of Q, then we 
can define f,, f as well by restricting f to 0’, thus f., f = Jo flov- 


We have to check that this definition is consistent with our previous 
notion of Lebesgue integral for non-negative simple functions; in other 
words, if f : Q > R is a non-negative simple function, then the value 
of Je f given by this definition should be the same as the one given in 
the previous definition. But this is clear because f certainly minorizes 
itself, and any other non-negative simple function s which minorizes f 
will have an integral Js s less than or equal to e f, thanks to Proposition 
8.1.10(d). 


Remark 8.2.5. Note that to f is always at least 0, since 0 is simple, 
non-negative, and minorizes f. Of course, te f could equal +o. 
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Some basic properties of the Lebesgue integral on non-negative mea- 
surable functions (which supercede Proposition 8.1.10): 


Proposition 8.2.6. Let be a measurable set, and let f : Q — [0,00] 
and g: Q — [0,00] be non-negative measurable functions. 


(a) We have 0 < fof < co. Furthermore, we have fo f = 0 if and 
only if f(x) =0 for almost every x €Q. 

b) For any positive number c, we have te cf = om Be f. 

c) If f(x) < g(x) for alla €Q, then we have fof < Jog. 


If f(x) = g(x) for almost every « € Q, then fo f = Jog. 


If Y CQ is measurable, then fo, f = Jo fra < Jo f.- 


d 


e 


(b) 
(c) 
(d) 
(e) 


Proof. See Exercise 8.2.1. 


Remark 8.2.7. Proposition 8.2.6(d) is quite interesting; it says that 
one can modify the values of a function on any measure zero set (e.g., 
you can modify a function on every rational number), and not affect its 
integral at all. It is as if no individual point, or even a measure zero 
collection of points, has any “vote” in what the integral of a function 
should be; only the collective set of points has an influence on an integral. 


Remark 8.2.8. Note that we do not yet try to interchange sums and 
integrals. From the definition it is fairly easy to prove that fo(f+ 9) => 
Jof + Jog (Exercise 8.2.2), but to prove equality requires more work 
and will be done later. 


As we have seen in previous chapters, we cannot always interchange 
an integral with a limit (or with limit-like concepts such as supremum). 
However, with the Lebesgue integral it is possible to do so if the functions 
are increasing: 


Theorem 8.2.9 (Lebesgue monotone convergence theorem). Let 2 be a 
measurable subset of R", and let (fn)? be a sequence of non-negative 
measurable functions from Q to R. which are increasing in the sense that 


O< fila) < fo(x) < fa(x) <... for alla EQ. 


(Note we are assuming that f,(x) is increasing with respect to n; this 
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is a different notion from f,(x) increasing with respect to x.) Then we 


have 
o<fasfas| ns... 
[ sw te =su f fo 


Proof. The first conclusion is clear from Proposition 8.2.6(c). Now we 
prove the second conclusion. From Proposition 8.2.6(c) again we have 


[ sutn> f tu 


for every n; taking suprema in n we obtain 


[shim = our | fy 
Qo om n Q 


which is one half of the desired conclusion. To finish the proof we have 


to show 
[ sx fn as sup | dns 
Q m n Q 


From the definition of {, sup,, fm, it will suffice to show that 


[sso [ tn 
Q n Q 


for all simple non-negative functions which minorize sup,,, fm. 
Fix s. We will show that 


(ie) [ s<sw ft, 


for every 0 < € < 1; the claim then follows by taking limits as ¢ > 0. 
Fix «. By construction of s, we have 


s(x) < sg fn(2) 


for every x € 2. Hence, for every x € 2 there exists an N (depending 
on x) such that 


fn(z) = (1 —€)s(2). 
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Since the f,, are increasing, this will imply that f,(a2) > (1 — ¢)s(x) for 
alln > N. Thus, if we define the sets E, by 


Ey, := {x €Q: fr(x) > 1 — €)s(x)} 


then we have Fy C Ey C E3c... and UP, Ey, =f. 
From Proposition 8.2. 6(cdf) we have 


anf a= fe a-es< fi fas f fn 


so to finish the argument it will suffice to show that 


sup | s=/'s 
n JEn Q 


‘ . . : . N 
Since s is a simple function, we may write s = )7j_, 
measurable F; and positive c;. Since 


N 
[s=Seim F, 
Q G4 


cjXF, for some 


and 
N N 
| s= | So iXF:MEn = So eym(F; M En) 
n En j=1 j=l 


it thus suffices to show that 


sup m(F;N E,) = m(F;) 
n 


for each j. But this follows from Exercise 7.2.3(a). 


This theorem is extremely useful. For instance, we can now inter- 
change addition and integration: 


Lemma 8.2.10 (Interchange of addition and integration). Let Q be a 
measurable subset of R”, and let f : Q — [0,co] and g: Q > [0,00] be 
measurable functions. Then Jo(f +9) = fof + Jog: 


Proof. By Lemma 8.1.5, there exists a sequence 0 < sj < sg <...< f 
of simple functions such that sup,,s, = f, and similarly a sequence 
O0<t, <tg <...<g of simple functions such that sup,,t, = g. Since 
the s, are increasing and the t, are increasing, it is then easy to check 
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that s;, +t» is also increasing and sup,,(s, +tn) = f +g (why?). By the 
monotone convergence theorem (Theorem 8.2.9) we thus have 


[ t=su [50 
[o=sw [0 
[tt +9) = sup [ (on +t) 


But by Proposition 8.1.9(b) we have fo(sn + tn) = fg 5n+ Jotn. By 
Proposition 8.1.9(d), J sn and JQ tn are both increasing in n, so 


we (Lo Lo) Cee he) Corb 


and the claim follows. 


Of course, once one can interchange an integral with a sum of two 
functions, one can handle an integral and any finite number of functions 
by induction. More surprisingly, one can handle infinite sums as well of 
non-negative functions: 


Corollary 8.2.11. [f 2 is a measurable subset of R”, and g1, go,... 
are a sequence of non-negative measurable functions from Q to [0, oo, 


then 
CO CO 
[m= d fim 


Proof. See Exercise 8.2.3. 


Remark 8.2.12. Note that we do not need to assume anything about 
the convergence of the above sums; it may well happen that both sides 
are equal to +co. However, we do need to assume non-negativity; see 
Exercise 8.2.4. 


One could similarly ask whether we could interchange limits and 
integrals; in other words, is it true that 


| lim f, = lim ot 
Qr>co N—- Oo 


Unfortunately, this is not true, as the aes “moving bump” example 
shows. For each n = 1,2,3..., let f, : R — R be the function f, = 
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X{nn+1) Then limn+oo fn(#) = 0 for every x, but fp fn = 1 for every n, 
and hence limn-_ oo te fn = 140. In other words, the limiting function 
limpsoo fn can end up having significantly smaller integral than any 
of the original integrals. However, the following very useful lemma of 
Fatou shows that the reverse cannot happen - there is no way the limiting 
function has larger integral than the (limit of the) original integrals: 


Lemma 8.2.13 (Fatou’s lemma). Let Q be a measurable subset of R”, 
and let fy, fo,... be a sequence of non-negative functions from Q to 


[0, co]. Then 
[im inf fn < timint [ Tis 
Q n—-Ooo noo Q 


Proof. Recall that 
lim inf f, = sup ( inf fn) 
n—0co n m>n 


and hence by the monotone convergence theorem 


[iim inf fn =sup | (int fm) 
Q Noo n Q m>n 


By Proposition 8.2.6(c) we have 


| (af, tm) < ff; 


for every j > n; taking infima in 7 we obtain 
f < inf x 
J, (atm) = mt ft 


[im in inf fn < sup int ff = lim inf tf fs 
Q n j2n 


Thus 


as desired. 


Note that we are allowing our functions to take the value +oo at 
some points. It is even possible for a function to take the value +coo 
but still have a finite integral; for instance, if EF is a measure zero set, 
and f :Q—> R is equal to +co on E but equals 0 everywhere else, then 
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Jaf = 0 by Proposition 8.2.6(a). However, if the integral is finite, the 
function must be finite almost everywhere: 


Lemma 8.2.14. Let Q be a measurable subset of R”, and let f :Q—> 
[(0, co] be a non-negative measurable function such that fey is finite. 
Then f is finite almost everywhere (i.e., the set {x €Q: f(x) = +00} 
has measure zero). 


Proof. See Exercise 8.2.5. 


Form Corollary 8.2.11 and Lemma 8.2.14 one has a useful lemma: 


Lemma 8.2.15 (Borel-Cantelli lemma). Let 01,Q2,... be measurable 
subsets of R” such that S°°°_, m(Qn) is finite. Then the set 


{xe R”: x2 €Q, for infinitely many n} 


is a set of measure zero. In other words, almost every point belongs to 
only finitely many Qy. 


Proof. See Exercise 8.2.6. 


— Exercises — 


Exercise 8.2.1. Prove Proposition 8.2.6. (Hint: do not attempt to mimic the 
proof of Proposition 8.1.10; rather, try to use Proposition 8.1.10 and Definition 
8.2.2. For one direction of part (a), start with [, f = 0 and conclude that 
m{e € Q: f(x) > 1/n}) = 0 for every n = 1,2,3,..., and then use the 
countable sub-additivity. To prove (e), first prove it for simple functions.) 
Exercise 8.2.2. Let Q be a measurable subset of R”, and let f : Q — [0, +00] 
and g : 2 — [0,+00] be measurable functions. Without using Theorem 8.2.9 
or Lemma 8.2.10, prove that {,(f+9) > Joft+Jog- 

Exercise 8.2.3. Prove Corollary 8.2.11. (Hint: use the monotone convergence 
theorem with fy := a4 Gn-) 

Exercise 8.2.4. For each n = 1,2,3,..., let fn : R — R be the function 
fn = X{nynt1) — X[n+1,n42); Le., let fn(x) equal +1 when x € [n,n + 1), equal 
—1 when x € [n +1,n+4 2), and 0 everywhere else. Show that 


[ryined [tm 


Explain why this does not contradict Corollary 8.2.11. 
Exercise 8.2.5. Prove Lemma 8.2.14. 
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Exercise 8.2.6. Use Corollary 8.2.11 and Lemma 8.2.14 to prove Lemma 8.2.15. 
(Hint: use the indicator functions yq,,.) 


Exercise 8.2.7. Let p > 2 and c > 0. Using the Borel-Cantelli lemma, show 
that the set 


a Cc 
{x € [0,1]: |a—-|< 5 for infinitely many positive integers a, q} 
qd qd 
has measure zero. (Hint: one only has to consider those integers a in the range 


0 <a <q (why?). Use Corollary 11.6.5 to show that the sum }77~ ee is 
finite.) 


Exercise 8.2.8. Call areal number x € R diophantine if there exist real numbers 
p,C > 0 such that |x — ¢| > C/|q|? for all non-zero integers q and all integers 
a. Using Exercise 8.2.7, show that almost every real number is diophantine. 
(Hint: first work in the interval [0,1]. Show that one can take p and C to be 
rational and one can also take p > 2. Then use the fact that the countable 
union of measure zero sets has measure zero.) 


Exercise 8.2.9. For every positive integer n, let f, : R — [0,00) be a non- 
negative measurable function such that 


[ose 
R 4” 


Show that for every ¢ > 0, there exists a set EF’ of Lebesgue measure m(E) < ¢€ 
such that f,(2) converges pointwise to zero for all « € R\£. (Hint: first prove 
that m({z €R: fp(z) > sz}) < & for all n = 1,2,3,..., and then consider 
the union of all the sets {« € R: f(x) > r}-) 


Exercise 8.2.10. For every positive integer n, let fy, : [0,1] > [0,00) be a non- 
negative measurable function such that f,, converges pointwise to zero. Show 
that for every € > 0, there exists a set E of Lebesgue measure m(F) < € such 
that f(x) converges uniformly to zero for all x € [0,1]\E. (This is a special 
case of Egoroff’s theorem. 'To prove it, first show that for any positive integer 
m, we can find an N > 0 such that m({a € [0,1] : fn(%) > 1/m}) < €/2™ for 
all n > N.) Is the claim still true if [0,1] is replaced by R? 


Exercise 8.2.11. Give an example of a bounded non-negative function f : 
N x N > R¢ such that S>>_, f(n,m) converges for every n, and such that 
limn+oo f(n,m) exists for every m, but such that 


Jim SO f(n,m) # SY) lim. f(n,m). 
m=1 m=1 


(Hint: modify the moving bump example. It is even possible to use a function 
f which only takes the values 0 and 1.) This shows that interchanging limits 
and infinite sums can be dangerous. 
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8.3. Integration of absolutely integrable functions 


We have now completed the theory of the Lebesgue integral for non- 
negative functions. Now we consider how to integrate functions which 
can be both positive and negative. However, we do wish to avoid the 
indefinite expression +00 + (—oo), so we will restrict our attention to a 
subclass of measurable functions - the absolutely integrable functions. 


Definition 8.3.1 (Absolutely integrable functions). Let 2 be a mea- 
surable subset of R”. A measurable function f : Q — R* is said to be 
absolutely integrable if the integral J, |f| is finite. 


Of course, |f| is always non-negative, so this definition makes sense 
even if f changes sign. Absolutely integrable functions are also known 
as L'(Q) functions. 

If f : Q > R* is a function, we define the positive part ft : Q > 
(0, co] and negative part f~ :Q — [0, co] by the formulae 


+ := max(f,0); f~ :=—min(f,0). 


From Corollary 7.5.6 we know that f* and f~ are measurable. Observe 
also that f+ and f~ are non-negative, that f = ft — f~, and |f| = 
ft + f7-. (Why?). 


Definition 8.3.2 (Lebesgue integral). Let f : Q — R* be an absolutely 
integrable function. We define the Lebesgue integral te f of f to be the 


quantity 
bebe kr 


Note that since f is absolutely integrable, [, ft and [, f~ are less 
than or equal to f,|f| and hence are finite. Thus f, f is always finite; 
we are never encountering the indeterminate form +-co — (+00). 

Note that this definition is consistent with our previous definition 
of the Lebesgue integral for non-negative functions, since if f is non- 
negative then f* = f and f~ = 0. We also have the useful triangle 


inequality 
fas fre fr = fin (8.1) 


(Exercise 8.3.1). 
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Some other properties of the Lebesgue integral: 


Proposition 8.3.3. Let Q be a measurable set, and let f :Q—4 R and 
g:Q2—4R be absolutely integrable functions. 


(a) For any real number c (positive, zero, or negative), we have that 
cf is absolutely integrable and te cf = Cle si 


(b) The function f +g is absolutely integrable, and Jo(f+9) = Jo f+ 
Jog: 
(c) If f(x) < g(a) for all x €Q, then we have fo f < Jog- 
(d) If f(x) = g(x) for almost every x € Q, then fof = Jog. 
Proof. See Exercise 8.3.2. 


As mentioned in the previous section, one cannot necessarily inter- 
change limits and integrals, lim f f,, = flim f,, as the “moving bump 
example” showed. However, it is possible to exclude the moving bump 
example, and successfully interchange limits and integrals, if we know 
that the functions f, are all majorized by a single absolutely integrable 
function. This important theorem is known as the Lebesgue dominated 
convergence theorem, and is extremely useful: 


Theorem 8.3.4 (Lebesgue dominated convergence thm). Let 2 be a 
measurable subset of R”, and let fy, fo,... be a sequence of measur- 
able functions from Q to R* which converge pointwise. Suppose also 
that there is an absolutely integrable function F : Q — [0,00] such that 
lfn(x)| < F(x) for all x € QO and all n = 1,2,3,.... Then 


| lim f, = lim hans 
Qr-co N—- Oo Q 


Proof. Let f : Q — R* be the function f(x) := limn-soo fn(x); this 
function exists by hypothesis. By Lemma 7.5.10, f is measurable. Also, 
since |f,(x)| < F(x) for all n and all x € Q, we see that each f, is 
absolutely integrable, and by taking limits we obtain |f(x)| < F(x) for 
all « € 9, so f is also absolutely integrable. Our task is to show that 
limp yoo i fra = ie ie 

The functions F' + f, are non-negative and converge pointwise to 
F +f. So by Fatou’s lemma (Lemma 8.2.13) 


[retsii inf [rth 
Q nN—- Ooo Q 
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[tsi inf es 
Q N+ Oo Q 


But the functions F — f, are also non-negative and converge pointwise 
to F — f. So by Fatou’s lemma again 


[Passi int, | Ff. 
Q N00 JQ 


Since the right-hand side is [4 F — limsup,_,.. fo fn (why did the lim 
inf become a lim sup?), we thus have 


[ J > Him sup bee 


Thus the lim inf and lim sup of Te fn are both equal to te f, as desired. 


and thus 


Finally, we record a lemma which is not particularly interesting in 
itself, but will have some useful consequences later in these notes. 


Definition 8.3.5 (Upper and lower Lebesgue integral). Let 92 be a mea- 
surable subset of R”, and let f : Q— R be a function (not necessarily 
measurable). We define the upper Lebesgue integral [ qf to be 


i Jan} | g:g is an absolutely integrable function 
Q Q 


from 2 to R that majorizes f } 
and the lower Lebesgue integral Bes to be 


‘| f :=sup{ | g:g is an absolutely integrable function 
YO Q 


from 2 to R that minorizes f}. 


It is easy to see that de f< fay (why? use Proposition 8.3.3(c)). 
When f is absolutely intemrable then equality occurs (why?). The con- 
verse is also true: 


Lemma 8.3.6. Let Q be a measurable subset of R”, and let f: 27 R 
be a function (not necessarily measurable). Let A be a real number, and 
SUPPOSE Jar = Lae = A. Then f is absolutely integrable, and 


ie 
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Proof. By definition of upper Lebesgue integral, for every integer n > 

1 we may find an absolutely integrable function f,/ : Q > R which 
majorizes f such that 

1 

es 

Q nm 


Similarly we may find an absolutely integrable function f, :Q > R 
which minorizes f such that 


[sat 
Q n 


Let F'* := inf, f,* and F~ := sup, f,. Then Ft and F are mea- 
surable (by Lemma 7.5.10) and absolutely integrable (because they are 
squeezed between the absolutely integrable functions ea and f; , for 
instance). Also, F* majorizes f and F~ minorizes f. Finally, we have 


pets | itsate 
Q Q n 
[rtsa 
Q 
[rza 
Q 


but F* majorizes F~, and hence les Fr> tes F~. Hence we must have 


[rts [na 

Q Q 
[rt-F =o. 
Q 


By Proposition 8.2.6(a), we thus have F* (x) = F(z) for almost every 
x. But since f is squeezed between F~ and F*, we thus have f(x) = 
F*(x) = F(a) for almost every x. In particular, f differs from the 
absolutely integrable function F'* only on a set of measure zero and is 
thus measurable (see Exercise 7.5.5) and absolutely integrable, with 


behe- ke 


for every n, and hence 


Similarly we have 


In particular 


as desired. 
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— Exercises — 
Exercise 8.3.1. Prove (8.1) whenever 2 is a measurable subset of R” and f is 
an absolutely integrable function. 


Exercise 8.3.2. Prove Proposition 8.3.3. (Hint: for (b), break f, g, and f+ g 
up into positive and negative parts, and try to write everything in terms of 
integrals of non-negative functions only, using Lemma 8.2.10.) 


Exercise 8.3.3. Let f : R — Rand g: R — R be absolutely integrable, 
measurable functions such that f(a) < g(x) for alla € R, and that J, f = Jpg. 
Show that f(x) = g(x) for almost every x € R (i-e., that f(x) = g(x) for all 
x € R except possibly for a set of measure zero). 


8.4 Comparison with the Riemann integral 


We have spent a lot of effort constructing the Lebesgue integral, but have 
not yet addressed the question of how to actually compute any Lebesgue 
integrals, and whether Lebesgue integration is any different from the 
Riemann integral (say for integrals in one dimension). Now we show 
that the Lebesgue integral is a generalization of the Riemann integral. 
To clarify the following discussion, we shall temporarily distinguish the 
Riemann integral from the Lebesgue integral by writing the Riemann 
integral [, f as R. J, f. 
Our objective here is to prove 


Proposition 8.4.1. Let I C R be an interval, and let f : I > R be a 
Riemann integrable function. Then f is also absolutely integrable, and 
Si f=. Si f. 

Proof. Write A:= R. {; f. Since f is Riemann integrable, we know that 


the upper and lower Riemann integrals are equal to A. Thus, for every 
€ > 0, there exists a partition P of J into smaller intervals J such that 


A-éex< i <A< < 
eS) llinf fe) < As) |Jlsup f(a) < Are, 
JEP JEP 


where |.J| denotes the length of J. Note that |J| is the same as m(J), 
since J is a box. 
Let f- :1 + Rand fy :I—>R be the functions 


fe () = )_ int f(@)xu(2) 
JEP 


and 


fi(z)=>— sup f(z)xs(2); 


JEP *€ 
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these are simple functions and hence measurable and absolutely inte- 
grable. By Lemma 8.1.9 we have 


[=X vlint s@ 


JEP 
and 
fz = > lJ sup f(z) 
i ° oy, red 
and hence 


Aves fgrsas [ ff<Ate 
I I 


Since f+ majorizes f, and f> minorizes f, we thus have 


A-esf ssf sete 


for every ¢, and thus 
Ly-Ty- 
Yo Q 


and hence by Lemma 8.3.6, f is absolutely integrable with [ ,/ =A, as 
desired. 


Thus every Riemann integrable function is also Lebesgue integrable, 
at least on bounded intervals, and we no longer need the R. f, , / notation. 
However, the converse is not true. Take for instance the function f : 
(0, 1] + R defined by f(x) := 1 when z is rational, and f(x) := 0 when 
x is irrational. Then from Proposition 11.7.1 we know that f is not 
Riemann integrable. On the other hand, f is the characteristic function 
of the set QM [0,1], which is countable and hence measure zero. Thus 
f is Lebesgue integrable and Jo, f = 0. Thus the Lebesgue integral 
can handle more functions than the Riemann integral; this is one of the 
primary reasons why we use the Lebesgue integral in analysis. (The 
other reason is that the Lebesgue integral interacts well with limits, 
as the Lebesgue monotone convergence theorem, Fatou’s lemma, and 
Lebesgue dominated convergence theorem already attest. There are no 
comparable theorems for the Riemann integral). 
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8.5 Fubini’s theorem 


In one dimension we have shown that the Lebesgue integral is connected 
to the Riemann integral. Now we will try to understand the connection 
in higher dimensions. To simplify the discussion we shall just study 
two-dimensional integrals, although the arguments we present here can 
easily be extended to higher dimensions. 

We shall study integrals of the form tre f. Note that once we know 
how to integrate on R?, we can integrate on measurable subsets Q of 
R?, since Ve f can be rewritten as Jr fxa: 

Let f(x,y) be a function of two variables. In principle, we have three 
different ways to integrate f on R?. First of all, we can use the two- 
dimensional Lebesgue integral, to obtain [2 f. Secondly, we can fix x 
and compute a one-dimensional integral in y, and then take that quantity 
and integrate in a, thus obtaining fp (JR f(x,y) dy) dx. Secondly, we 
could fix y and integrate in x, and then integrate in y, thus obtaining 
JrUp f(@,y) de) dy. 

Fortunately, if the function f is absolutely integrable on f, then all 
three integrals are equal: 


Theorem 8.5.1 (Fubini’s theorem). Let f : R? > R be an absolutely 
integrable function. Then there exists absolutely integrable functions 
F:R->RandG: R= R such that for almost every x, f(x,y) 
is absolutely integrable in y with 


= ff fle) ay 


and for almost every y, f(x,y) is absolutely integrable in x with 


=f flew) dx 
[r@a=f t= few ay 


Remark 8.5.2. Very roughly speaking, Fubini’s theorem says that 


[fp teo ay) w=f t=[ (fren ar) dy. 


This allows us to compute two-dimensional integrals by splitting them 
into two one-dimensional integrals. The reason why we do not write 


Finally, we have 
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Fubini’s theorem this way, though, is that it is possible that the in- 
Le art x,y) dy does not actually exist for every x, and similarly 
eat x,y) dx does not exist for every y; Fubini’s theorem only asserts 
that these integrals only exist for almost every x and y. For instance, if 
f(x,y) is the function which equals 1 when y > 0 and x = 0, equals —1 
when y < 0 and x = 0, and is zero otherwise, then f is absolutely inte- 
grable on R? and Jie J = 0 (since f equals zero almost everywhere in 
R’), but fate x,y) dy is not absolutely integrable when x = 0 (though 
it is sbealitely integrable for every other x). 


Proof. The proof of Fubini’s theorem is quite complicated and we will 
only give a sketch here. We begin with a series of reductions. 

Roughly speaking (ignoring issues relating to sets of measure zero), 
we have to show that 


[ (fen iy) ae = La 


together with a similar equality with x and y reversed. We shall just 
prove the above equality, as the other one is very similar. 

First of all, it suffices to prove the theorem for non-negative func- 
tions, since the general case then follows by writing a general function f 
as a difference ft — f~ of two non-negative functions, and applying Fu- 
bini’s theorem to f* and f~ separately (and using Proposition 8.3.3(a) 
and (b)). Thus we will henceforth assume that f is non-negative. 

Next, it suffices to prove the theorem for non-negative functions f 
supported on a bounded set such as [—N, N] x [—N, N] for some pos- 
itive integer N. Indeed, once one obtains Fubini’s theorem for such 
functions, one can then write a general function f as the supremum of 
such compactly supported functions as 


f = sup fX[-N,N]x[-N,N]> 
N>0 


apply Fubini’s theorem to each function fx{_ 1,1) x[-N,N] Separately, and 
then take suprema using the monotone convergence theorem. Thus we 
will henceforth assume that f is supported on |[—N, N] x [—N, N}. 

By another similar argument, it suffices to prove the theorem for non- 
negative simple functions supported on |—N, N]| x [—N, N], since one can 
use Lemma 8.1.4 to write f as the supremum of simple functions (which 
must also be supported on [—N,N]), apply Fubini’s theorem to each 
simple function, and then take suprema using the monotone convergence 
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theorem. Thus we may assume that f is a non-negative simple function 
supported on [—N, N] x [—N, N]. 

Next, we see that it suffices to prove the theorem for characteristic 
functions supported in [—N, N] x [—N, N]. This is because every simple 
function is a linear combination of characteristic functions, and so we can 
deduce Fubini’s theorem for simple functions from Fubini’s theorem for 
characteristic functions. Thus we may take f = yg for some measurable 
E C [-N,N] x [-N,N]. Our task is then to show (ignoring sets of 
measure zero) that 


Vis eee xe(@,y) i) da = m(E). 


It will suffice to show the upper Lebesgue integral estimate 


| (/ XE (2, y) iv dx < m(E). (8.2) 
[-N,N] [—-N,N] 


We will prove this estimate later. Once we show this for every set E', we 
may substitute E with [—N, N] x [—N, N]\E and obtain 


i (/ ieyete gy iy eat): 
[-N,N] [-N,N] 


But the left-hand side is equal to 


/ (2N -{ xe(x,y) dy) dx 
[-N,N] / [-N,N 


which is in turn equal to 


4N? -f / xn (x,y) iy da 
J 1-N,N) \LLEN,N] 


and thus we have 


i / XE (x,y) wv dx > m(E). 
“[-N,N] \“{-N,N] 
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In particular we have 


/ (/ xn (0,9) i) dx > m(B) 
/ [-N,N] [-N,N] 


and hence by Lemma 8.3.6 we see that es Nn,n]X z£(2,y) dy is absolutely 
integrable and 


(tee (ac iy dx = m(E). 


A similar argument shows that 


x,y) dy | dx =m(E 
ae Ue y) ’ (E) 


and hence 


‘f / XE(z,y) dy — i a) dx = 0. 
[-N,N] [-N,N] “ {—N,N] 


Thus by Proposition 8.2.6(a) we have 


/ XB(x,y) dy = i. XE(@,y) dy 
/ [-N,N] [-N,N] 


for almost every x € [—N,N]. Thus yg (2, y) is absolutely integrable in 
y for almost every x, and Sc NN] XE(x,y) is thus equal (almost every- 
where) to a function F(x) such that 


FG) dein B) 
N,N] 


as desired. 

It remains to prove the bound (8.2). Let ¢ > 0 be arbitrary. Since 
m(£) is the same as the outer measure m*(E), we know that there exists 
an at most countable collection (B;)j¢7 of boxes such that EF CU jes Bj 
and 


S| m(B;) < m(E) +. 
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Each box B; can be written as B; = I; x I, for some intervals J; and 


L. Observe that 
m(By) = GILG = f ld ax = | (/ i) dx 
I; Fis NaI 
= j (/ Xr, x1 (2, y) i) dy 
[-N,N] \Y[-N,N] : 


J 
‘ ( | XB; (X,Y) is) dy. 
[-N,N] [-N,N] 


Adding this over all 7 € J (using Corollary 8.2.11) we obtain 


jeJ [ ¥eI 
In particular we have 
i S> x; (x,y) dx | dy < m(E) +e. 
[-N.N] \J [-N.N] Sey 


But >) jc XB; Majorizes yz (why?) and thus 


/ (/ XA (2, y) ts) dy < m(E) +e. 
[-N,N] [-N,N] 


But ¢ is arbitrary, and so we have (8.2) as desired. This completes the 
proof of Fubini’s theorem. 
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